A survey of grammatical inference in software engineering

Grammatical inference - used successfully in a variety of fields such as pattern recognition, computational biology and natural language processing - is the process of automatically inferring a grammar by examining the sentences of an unknown language. Software engineering can also benefit from grammatical inference. Unlike these other fields, which use grammars as a convenient tool to model naturally occurring patterns, software engineering treats grammars as first-class objects typically created and maintained for a specific purpose by human designers. We introduce the theory of grammatical inference and review the state of the art as it relates to software engineering. We survey grammatical inference as it relates to software engineering.A background on the theory of grammatical inference is provided.We explore a variety of applications in software engineering.These include programming languages, DSLs, visual languages, and execution traces.

[1]  Dana Angluin,et al.  Inductive Inference of Formal Languages from Positive Data , 1980, Inf. Control..

[2]  Debasish Ghosh,et al.  DSLs in Action , 2010 .

[3]  Alpana Dubey,et al.  Learning context-free grammar rules from a set of program , 2008, IET Softw..

[4]  James R. Larus,et al.  Mining specifications , 2002, POPL '02.

[5]  D. Angluin Negative Results for Equivalence Queries , 1990, Machine Learning.

[6]  Ralf Lämmel,et al.  Semi‐automatic grammar recovery , 2001, Softw. Pract. Exp..

[7]  Kevin J. Lang Faster Algorithms for Finding Minimal Consistent DFAs , 1999 .

[8]  Marjan Mernik,et al.  Formal and Practical Aspects of Domain-Specific Languages: Recent Developments , 2012 .

[9]  Pat Langley,et al.  Learning Context-Free Grammars with a Simplicity Bias , 2000, ECML.

[10]  Ralf Lämmel,et al.  An Introduction to Grammar Convergence , 2009, IFM.

[11]  Arlindo L. Oliveira,et al.  Inference of regular languages using state merging algorithms with search , 2005, Pattern Recognit..

[12]  Pedro García,et al.  Inferring Subclasses of Regular Languages Faster Using RPNI and Forbidden Configurations , 2002, ICGI.

[13]  Oscar Nierstrasz,et al.  Example-Driven Reconstruction of Software Models , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[14]  Yasubumi Sakakibara,et al.  Learning context-free grammars from structural data in polynomial time , 1988, COLT '88.

[15]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[16]  Dana Angluin,et al.  Inference of Reversible Languages , 1982, JACM.

[17]  Neil Walkinshaw,et al.  Improving dynamic software analysis by applying grammar inference principles , 2008 .

[18]  Dana Angluin,et al.  When won't membership queries help? , 1991, STOC '91.

[19]  Pedro García,et al.  IDENTIFYING REGULAR LANGUAGES IN POLYNOMIAL TIME , 1993 .

[20]  Colin de la Higuera,et al.  Learning Languages with Help , 2002, ICGI.

[21]  David Lo,et al.  Learning extended FSA from software: An empirical assessment , 2012, J. Syst. Softw..

[22]  Amaury Habrard,et al.  A Polynomial Algorithm for the Inference of Context Free Languages , 2008, ICGI.

[23]  Merijn de Jonge,et al.  Grammars as contracts , 2001 .

[24]  Barak A. Pearlmutter,et al.  Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm , 1998, ICGI.

[25]  Giovanni Guida,et al.  Noncounting Context-Free Languages , 1978, JACM.

[26]  Alexander L. Wolf,et al.  Discovering models of software processes from event-based data , 1998, TSEM.

[27]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[28]  Frits W. Vaandrager,et al.  Learning and Testing the Bounded Retransmission Protocol , 2012, ICGI.

[29]  Ming Li,et al.  Learning Simple Concept Under Simple Distributions , 1991, SIAM J. Comput..

[30]  Stefano Crespi-Reghizzi,et al.  The use of grammatical inference for designing programming languages , 1973, Commun. ACM.

[31]  Yasubumi Sakakibara,et al.  Recent Advances of Grammatical Inference , 1997, Theor. Comput. Sci..

[32]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[33]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[34]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[35]  M Mernik,et al.  When and how to develop domain-specific languages , 2005, CSUR.

[36]  Bradford Starkie Programming Spoken Dialogs Using Grammatical Inference , 2001, Australian Joint Conference on Artificial Intelligence.

[37]  Luigi Troiano,et al.  Search-based inference of dialect grammars , 2007, Soft Comput..

[38]  Colin de la Higuera,et al.  Current Trends in Grammatical Inference , 2000, SSPR/SPR.

[39]  Ralf Lämmel,et al.  Towards an engineering discipline for GRAMMARWARE Draft as of August 17 , 2003 , 2003 .

[40]  Steven P. Reiss,et al.  Encoding program executions , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[41]  Anand Raman,et al.  The sk-strings method for inferring PFSA , 1997 .

[42]  Jun Kong,et al.  Adaptive Mobile Interfaces through Grammar Induction , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[43]  James R. Cordy,et al.  Grammatical Inference in Software Engineering: An Overview of the State of the Art , 2012, SLE.

[44]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[45]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[46]  Maria João Varanda Pereira,et al.  Grammatical approach to problem solving , 2003, Proceedings of the 25th International Conference on Information Technology Interfaces, 2003. ITI 2003..

[47]  Faizan Javed,et al.  A memetic grammar inference algorithm for language learning , 2012, Appl. Soft Comput..

[48]  Tim Oates,et al.  Learning Deterministic Finite Automata from Interleaved Strings , 2010, ICGI.

[49]  Boris A. Trakhtenbrot,et al.  Finite automata : behavior and synthesis , 1973 .

[50]  Viljem Zumer,et al.  Extracting grammar from programs: brute force approach , 2005, SIGP.

[51]  Colin de la Higuera,et al.  A bibliographical study of grammatical inference , 2005, Pattern Recognit..

[52]  J. Larus Whole program paths , 1999, PLDI '99.

[53]  Ahmed Umar Memon,et al.  Log File Categorization and Anomaly Analysis Using Grammar Inference , 2008 .

[54]  Marjan Mernik,et al.  Embedding DSLs into GPLS: a Grammatical Inference Approach , 2011, Inf. Technol. Control..

[55]  Marjan Mernik,et al.  Graph Grammar Induction as a Parser-Controlled Heuristic Search Process , 2011, AGTIVE.

[56]  Lillian Lee,et al.  Learning of Context-Free Languages: A Survey of the Literature , 1996 .

[57]  Alexander Clark,et al.  Distributional Learning of Some Context-Free Languages with a Minimally Adequate Teacher , 2010, ICGI.

[58]  Jerome A. Feldman,et al.  On the Synthesis of Finite-State Machines from Samples of Their Behavior , 1972, IEEE Transactions on Computers.

[59]  Stefan C. Kremer,et al.  Inducing Grammars from Sparse Data Sets: A Survey of Algorithms and Results , 2003, J. Mach. Learn. Res..

[60]  Marjan Mernik,et al.  On automata and language based grammar metrics , 2010, Comput. Sci. Inf. Syst..

[61]  Jordan B. Pollack,et al.  A Stochastic Search Approach to Grammar Induction , 1998, ICGI.

[62]  Faizan Javed,et al.  Inferring context-free grammars for domain-specific languages , 2005, OOPSLA '05.

[63]  Jácome Cunha,et al.  Automatically Inferring ClassSheet Models from Spreadsheets , 2010, 2010 IEEE Symposium on Visual Languages and Human-Centric Computing.

[64]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[65]  Faizan Javed,et al.  MARS: A metamodel recovery system using grammar inference , 2008, Inf. Softw. Technol..

[66]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[67]  Yasubumi Sakakibara,et al.  Efficient Learning of Context-Free Grammars from Positive Structural Examples , 1992, Inf. Comput..

[68]  H. Ishizaka Polynomial Time Learnability of Simple Deterministic Languages , 1990, Machine Learning.

[69]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey-Part I , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Enrique Vidal,et al.  Grammatical Inference: An Introduction Survey , 1994, ICGI.

[71]  Paul M. B. Vitányi,et al.  A Theory of Learning Simple Concepts Under Simple Distributions , 1989, COLT 1989.

[72]  Amos Storkey,et al.  Introduction Machine Learning and Pattern Recognition , 2014 .

[73]  Jeffrey G. Gray,et al.  Application of Metamodel Inference with Large-Scale Metamodels , 2012, Int. J. Softw. Informatics.

[74]  Colin de la Higuera,et al.  Ten Open Problems in Grammatical Inference , 2006, ICGI.

[75]  Marjan Mernik,et al.  Metamodel Recovery from Multi-tiered Domains Using Extended MARS , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference.

[76]  Bradford Starkie Inferring Attribute Grammars with Structured Data for Natural Language Processing , 2002, ICGI.

[77]  Faizan Javed,et al.  A Grammar-Based Approach to Class Diagram Validation , 2005 .

[78]  Faizan Javed,et al.  Extracting grammar from programs: evolutionary approach , 2005, SIGP.

[79]  Dana Angluin Negative results for equivalence queries , 1990, Mach. Learn..

[80]  Dana Angluin,et al.  A Note on the Number of Queries Needed to Identify Regular Languages , 1981, Inf. Control..

[81]  Martin Fowler,et al.  Domain-Specific Languages , 2010, The Addison-Wesley signature series.

[82]  Ali Arsanjani,et al.  A goal-driven approach to enterprise component identification and specification , 2002, CACM.

[83]  T. Yokomori On Polynomial-Time Learnability in the Limit of Strictly Deterministic Automata , 1995, Machine Learning.

[84]  Menno van Zaanen,et al.  Computational Grammatical Inference , 2006 .

[85]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[86]  Lawrence B. Holder,et al.  Graph Grammar Induction on Structural Data for Visual Programming , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[87]  Viljem Zumer,et al.  Grammar-Based Systems: Definition and Examples , 2004, Informatica.

[88]  Andrey Burago Learning structurally reversible context-free grammars from queries and counterexamples in polynomial time , 1994, COLT '94.

[89]  Takashi Yokomori Polynomial-time learning of very simple grammars from positive data , 1991, COLT '91.

[90]  Faizan Javed,et al.  Incrementally Inferring Context-Free Grammars for Domain-Specific Languages , 2006, SEKE.

[91]  Faizan Javed,et al.  Grammar inference algorithms and applications in software engineering , 2009, 2009 XXII International Symposium on Information, Communication and Automation Technologies.

[92]  Donald E. Knuth,et al.  Semantics of context-free languages , 1968, Mathematical systems theory.

[93]  Dana Ron,et al.  Automata learning and its applications , 1995, Technical report.

[94]  Massimiliano Di Penta,et al.  Towards the automatic evolution of reengineering tools , 2005, Ninth European Conference on Software Maintenance and Reengineering.