Technique for extracting keyword based rules from a set of programs

We present a novel technique for extracting a grammar from a set of programs. A grammar is important for generating software analysis and modification tools. Most legacy applications are written in languages, which are minor variations (dialects) of the standard language. Normally, we have a grammar of the standard language, but the grammars of dialects are unavailable. In this paper we propose an iterative technique with backtracking for grammar extraction. Our technique extracts keyword based rules. This uses the CYK parsing algorithm and the LR error recovery technique for finding out new production rules. In each iteration a set of possible rules is built and one rule from them is selected. Finally, we get a grammar, which parses all programs in the set.

[1]  Sandiway Fong,et al.  Natural Language Grammatical Inference with Recurrent Neural Networks , 2000, IEEE Trans. Knowl. Data Eng..

[2]  Sanjeev K. Aggarwal,et al.  A technique for extracting grammar from legacy programs , 2004, IASTED Conf. on Software Engineering.

[3]  Chris Verhoef,et al.  Development, assessment, and reengineering of language descriptions , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[4]  Ralf Lämmel,et al.  Cracking the 500-Language Problem , 2001, IEEE Softw..

[5]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[6]  MernikMarjan,et al.  Extracting grammar from programs , 2005 .

[7]  Helena Ahonen,et al.  Generating grammars for structured documents using grammatical inference methods , 1994 .

[8]  Viljem Zumer,et al.  Can a parser be generated from examples? , 2003, SAC '03.

[9]  Colin de la Higuera,et al.  A bibliographical study of grammatical inference , 2005, Pattern Recognit..

[10]  Katsuhiko Nakamura,et al.  Incremental Learning of Context Free Grammars by Extended Inductive CYK Algorithm , 2003, ECML Workshop on Learning Contex-Free Grammars.

[11]  Rajesh Parekh,et al.  Grammar Inference Automata Induction and Language Acquisition , 2005 .

[12]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[13]  Menno van Zaanen ABL: Alignment-Based Learning , 2000, COLING.

[14]  Viljem Zumer,et al.  Extracting grammar from programs: brute force approach , 2005, SIGP.

[15]  M. Mernik,et al.  Context-free grammar induction using genetic programming , 2004, ACM-SE 42.

[16]  Lillian Lee,et al.  Learning of Context-Free Languages: A Survey of the Literature , 1996 .

[17]  Pedro Manuel Moreira Vaz Antunes de Sousa,et al.  Proceedings of the Fifth European Conference on Software Maintenance and Reengineering , 2000 .

[18]  Pankaj Jalote,et al.  An interactive method for extracting grammar from programs , 2004, Softw. Pract. Exp..

[19]  Ralf Lämmel,et al.  Semi‐automatic grammar recovery , 2001, Softw. Pract. Exp..

[20]  Rahul Jain Extracting Grammar From Programs , 1998 .