Algorithms for learning regular expressions from positive data

We describe algorithms that directly infer very simple forms of 1-unambiguous regular expressions from positive data. Thus, we characterize the regular language classes that can be learned this way, both in terms of regular expressions and in terms of (not necessarily minimal) deterministic finite automata.

[1]  Dana Angluin,et al.  Inference of Reversible Languages , 1982, JACM.

[2]  Markus Holzer,et al.  Finite Automata, Digraph Connectivity, and Regular Expression Size , 2008, ICALP.

[3]  P. Dupont,et al.  Inférence grammaticale régulière : fondements théoriques et principaux algorithmes , 1998 .

[4]  M. W. Shields An Introduction to Automata Theory , 1988 .

[5]  Henning Fernau,et al.  Learning XML Grammars , 2001, MLDM.

[6]  Helena Ahonen Disambiguation of SGML Content Models , 1996, PODP.

[7]  Vadim E. Levit,et al.  On Algebraic Expressions of Series-Parallel and Fibonacci Graphs , 2003, DMTCS.

[8]  Alvis Brazma,et al.  Learning a Subclass of Regular Expressions by Recognizing Periodic Repetitions , 1993, Scandinavian Conference on AI.

[9]  Efim B. Kinber Learning Regular Expressions from Representative Examples and Membership Queries , 2010, ICGI.

[10]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[11]  Thomas Zeugmann,et al.  Types of monotonic language learning and their characterization , 1992, COLT '92.

[12]  Andrew Watt Beginning Regular Expressions , 2005 .

[13]  Armin B. Cremers,et al.  Observations about bounded languages and developmental systems , 2005, Mathematical systems theory.

[14]  A. Brazma Efficient identification of regular expressions from representative examples , 1993, COLT '93.

[15]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[16]  Kyuseok Shim,et al.  XTRACT: Learning Document Type Descriptors from XML Document Collections , 2004, Data Mining and Knowledge Discovery.

[17]  Russell Greiner,et al.  Efficient Learning of Regular Expressions from Approximate Examples , 1997 .

[18]  Yon Dohn Chung,et al.  Efficient preprocessing of XML queries using structured signatures , 2003, Inf. Process. Lett..

[19]  Efim B. Kinber,et al.  On Learning Regular Expressions and Patterns Via Membership and Correction Queries , 2008, ICGI.

[20]  Anne Brüggemann-Klein,et al.  Unambiguity of Extended Regular Expressions in SGML Document Grammars , 1993, ESA.

[21]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[22]  Rolf Wiehagen From inductive inference to algorithmic learning theory , 2009, New Generation Computing.

[23]  Thomas Zeugmann,et al.  A Guided Tour Across the Boundaries of Learning Recursive Languages , 1995, GOSLER Final Report.

[24]  Efim B. Kinber,et al.  Generalized Regular Expressions-A Language for Synthesis of Programs with Braching in Loops , 1986, Theor. Comput. Sci..

[25]  Alvis Brazma,et al.  Learning of regular expressions by pattern matching , 1995, EuroCOLT.

[26]  Mark A. Fulk Prudence and Other Conditions on Formal Language Learning , 1990, Inf. Comput..

[27]  Thomas Shrimpton,et al.  Building a Collision-Resistant Compression Function from Non-compressing Primitives , 2008, ICALP.

[28]  Pierre Dupont,et al.  Incremental regular inference , 1996, ICGI.

[29]  P. Laird Learning from Good and Bad Data , 1988 .

[30]  Alan F. Blackwell,et al.  SWYN: a visual representation for regular expressions , 2001 .

[31]  Menno van Zaanen,et al.  Bootstrapping structure into language : alignment-based learning , 2001, ArXiv.

[32]  Dora Giammarresi,et al.  Deterministic Generalized Automata , 1995, Theor. Comput. Sci..

[33]  Thomas Schwentick,et al.  Inference of concise DTDs from XML data , 2006, VLDB.

[34]  Esko Ukkonen,et al.  Mining for Putative Regulatory Elements in the Yeast Genome Using Gene Expression Data , 2000, ISMB.

[35]  Henning Fernau,et al.  Permutations and Control Sets for Learning Non-regular Language Families , 2000, ICGI.

[36]  Alberto Bertoni,et al.  Context-Free Grammars and XML Languages , 2006, Developments in Language Theory.

[37]  Ian H. Witten,et al.  Objective evaluation of inferred context-free grammars , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[38]  Efim B. Kinber,et al.  Learning A Class of Regular Expressions via Restricted Subset Queries , 1992, AII.

[39]  Mirian Halfeld Ferrari Alves,et al.  Regular expression transformations to extend regular languages (with application to a Datalog XML schema validator) , 2007, J. Algorithms.

[40]  Gisela Schäfer-Richter,et al.  Über Eingabeabhängigkeit und Komplexität von Inferenzstrategien , 1984 .

[41]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..

[42]  Jean Berstel,et al.  Formal properties of XML grammars and languages , 2000, Acta Informatica.

[43]  Thomas Zeugmann,et al.  Incremental Learning from Positive Data , 1996, J. Comput. Syst. Sci..

[44]  I.H. Witten,et al.  On-line and off-line heuristics for inferring hierarchies of repetitions in sequences , 2000, Proceedings of the IEEE.

[45]  Rolf Wiehagen From Inductive Inference to Algorithmic Learning Theory (Invented). , 1994 .

[46]  Heikki Mannila,et al.  Forming Grammars for Structured Documents: an Application of Grammatical Inference , 1994, ICGI.

[47]  Alvis Brazma Inductive Synthesis of Dot Expressions , 1991, Baltic Computer Science.

[48]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[49]  Henning Fernau Identification of Function Distinguishable Languages , 2000, ALT.

[50]  José M. Sempere On a Class of Regular-like Expressions for Linear Languages , 2000, J. Autom. Lang. Comb..

[51]  Henning Fernau,et al.  Algorithms for Learning Regular Expressions , 2005, ALT.

[52]  Thomas Zeugmann Can Learning in the Limit Be Done Efficiently? , 2003, Discovery Science.

[53]  Thomas Zeugmann,et al.  Set-Driven and Rearrangement-Independent Learning of Recursive Languages , 1994, AII/ALT.

[54]  Dominique Perrin,et al.  Finite Automata , 1958, Philosophy.

[55]  Grzegorz Rozenberg,et al.  Handbook of Formal Languages , 1997, Springer Berlin Heidelberg.