Learning context-free grammar rules from a set of program

The grammar of a programming language is important because it is used in developing program analysis and modification tools. Sometimes programs are written in dialects-minor variations of standard languages. Normally, grammars of standard languages are available but grammars of dialects may not be available. A technique for reverse engineering context-free grammar rules is presented. The proposed technique infers rules from a given set of programs and an approximate grammar is generated using an iterative approach with backtracking. The correctness of the approach, is explained and a set of optimisations proposed to make the approach more efficient. The approach and the optimisations are experimentally verified on a set of programming languages.

[1]  Colin de la Higuera,et al.  A bibliographical study of grammatical inference , 2005, Pattern Recognit..

[2]  M. Mernik,et al.  Context-free grammar induction using genetic programming , 2004, ACM-SE 42.

[3]  Lillian Lee,et al.  Learning of Context-Free Languages: A Survey of the Literature , 1996 .

[4]  Rajesh Parekh,et al.  Grammar Inference Automata Induction and Language Acquisition , 2005 .

[5]  Esko Ukkonen,et al.  Lower Bounds on the Size of Deterministic Parsers , 1983, J. Comput. Syst. Sci..

[6]  Alpana Dubey,et al.  Technique for extracting keyword based rules from a set of programs , 2005, Ninth European Conference on Software Maintenance and Reengineering.

[7]  Pankaj Jalote,et al.  An interactive method for extracting grammar from programs , 2004, Softw. Pract. Exp..

[8]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[9]  Menno van Zaanen ABL: Alignment-Based Learning , 2000, COLING.

[10]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[11]  Sandiway Fong,et al.  Natural Language Grammatical Inference with Recurrent Neural Networks , 2000, IEEE Trans. Knowl. Data Eng..

[12]  Faizan Javed,et al.  Extracting grammar from programs: evolutionary approach , 2005, SIGP.

[13]  Chris Verhoef,et al.  Development, assessment, and reengineering of language descriptions , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[14]  Jeroen Geertzen,et al.  Grammatical Inference Using Suffix Trees , 2004, ICGI.

[15]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[16]  Viljem Zumer,et al.  Can a parser be generated from examples? , 2003, SAC '03.

[17]  Masaru Tomita,et al.  Graph-Structured Stack and Natural Language Parsing , 1988, ACL.

[18]  Alpana Dubey,et al.  Inferring Grammar Rules of Programming Language Dialects , 2006, ICGI.

[19]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[20]  Viljem Zumer,et al.  Extracting grammar from programs: brute force approach , 2005, SIGP.

[21]  Ralf Lämmel,et al.  Semi‐automatic grammar recovery , 2001, Softw. Pract. Exp..

[22]  Ralf Lämmel,et al.  Cracking the 500-Language Problem , 2001, IEEE Softw..

[23]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[24]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[25]  MernikMarjan,et al.  Extracting grammar from programs , 2005 .

[26]  Ralf Lämmel,et al.  Towards an engineering discipline for GRAMMARWARE Draft as of August 17 , 2003 , 2003 .