Inferring Grammar Rules of Programming Language Dialects

In this paper we address the problem of grammatical inference in the programming language domain. The grammar of a programming language is an important asset because it is used in developing many software engineering tools. Sometimes, grammars of languages are not available and have to be inferred from the source code; especially in the case of programming language dialects. We propose an approach for inferring the grammar of a programming language when an incomplete grammar along with a set of correct programs is given as input. The approach infers a set of grammar rules such that the addition of these rules makes the initial grammar complete. A grammar is complete if it parses all the input programs successfully. We also proposes a rule evaluation order, i.e. an order in which the rules are evaluated for correctness. A set of rules are correct if their addition makes the grammar complete. Experiments show that the proposed rule evaluation order improves the process of grammar inference.

[1]  Colin de la Higuera,et al.  A bibliographical study of grammatical inference , 2005, Pattern Recognit..

[2]  Peter Grünwald,et al.  A minimum description length approach to grammar inference , 1995, Learning for Natural Language Processing.

[3]  Pankaj Jalote,et al.  An interactive method for extracting grammar from programs , 2004 .

[4]  Alpana Dubey,et al.  Technique for extracting keyword based rules from a set of programs , 2005, Ninth European Conference on Software Maintenance and Reengineering.

[5]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[6]  Viljem Zumer,et al.  Extracting grammar from programs: brute force approach , 2005, SIGP.

[7]  Takeshi Koshiba,et al.  Learning Deterministic even Linear Languages From Positive Examples , 1997, Theor. Comput. Sci..

[8]  Viljem Zumer,et al.  Can a parser be generated from examples? , 2003, SAC '03.

[9]  Menno van Zaanen ABL: Alignment-Based Learning , 2000, COLING.

[10]  Lillian Lee,et al.  Learning of Context-Free Languages: A Survey of the Literature , 1996 .

[11]  Ralf Lämmel,et al.  Semi‐automatic grammar recovery , 2001, Softw. Pract. Exp..

[12]  Enric Plaza,et al.  Machine Learning: ECML 2000 , 2003, Lecture Notes in Computer Science.

[13]  Pat Langley,et al.  Learning Context-Free Grammars with a Simplicity Bias , 2000, ECML.

[14]  Alpana Dubey,et al.  A deterministic technique for extracting keyword based grammar rules from programs , 2006, SAC '06.

[15]  Ellen Riloff,et al.  Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing , 1996, Lecture Notes in Computer Science.

[16]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[17]  Sandiway Fong,et al.  Natural Language Grammatical Inference with Recurrent Neural Networks , 2000, IEEE Trans. Knowl. Data Eng..

[18]  Faizan Javed,et al.  Extracting grammar from programs: evolutionary approach , 2005, SIGP.

[19]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[20]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..