On the Intersection of Context-Free and Regular Languages

The Bar-Hillel construction is a classic result in formal language theory. It shows, by a simple construction, that the intersection of a context-free language and a regular language is itself context-free. In the construction, the regular language is specified by a finite-state automaton. However, neither the original construction (Bar-Hillel et al., 1961) nor its weighted extension (Nederhof and Satta, 2003) can handle finite-state automata with ε-arcs. While it is possible to remove ε-arcs from a finite-state automaton efficiently without modifying the language, such an operation modifies the automaton’s set of paths. We give a construction that generalizes the Bar- Hillel in the case the desired automaton has ε-arcs, and further prove that our generalized construction leads to a grammar that encodes the structure of both the input automaton and grammar while retaining the asymptotic size of the original construction.

[1]  Cyril Allauzen,et al.  Pushdown Automata in Statistical Machine Translation , 2014, CL.

[2]  Giorgio Satta,et al.  Prefix Probabilities for Linear Context-Free Rewriting Systems , 2011, IWPT.

[3]  Giorgio Satta,et al.  Computation of Infix Probabilities for Probabilistic Context-Free Grammars , 2011, EMNLP.

[4]  Roger Levy,et al.  Integrating surprisal and uncertain-input models in online sentence comprehension: formal techniques and empirical results , 2011, ACL.

[5]  Johan Schalkwyk,et al.  Filters for Efficient Composition of Weighted Finite-State Transducers , 2010, CIAA.

[6]  Andreas Maletti,et al.  Why Synchronous Tree Substitution Grammars? , 2010, NAACL.

[7]  C. D. L. Higuera,et al.  ε-Removal by Loop Reduction for Finite-state Automata , 2010 .

[8]  Giorgio Satta,et al.  Parsing Algorithms based on Tree Automata , 2009, IWPT.

[9]  M. Droste,et al.  Handbook of Weighted Automata , 2009 .

[10]  R. Levy A Noisy-Channel Model of Human Sentence Comprehension under Uncertain Input , 2008, EMNLP.

[11]  Liang Huang,et al.  Advanced Dynamic Programming in Semiring and Hypergraph Frameworks , 2008, COLING.

[12]  Giorgio Satta,et al.  Probabilistic Parsing as Intersection , 2003, IWPT.

[13]  C. Papadimitriou,et al.  Introduction to the Theory of Computation , 2018 .

[14]  Philip Resnik,et al.  A formal model of ambiguity and its applications in machine translation , 2010 .

[15]  Mehryar Mohri,et al.  Generic -removal Algorithm for Weighted Automata , 2007 .

[16]  Mehryar Mohri,et al.  Semiring Frameworks and Algorithms for Shortest-Distance Problems , 2002, J. Autom. Lang. Comb..

[17]  Jeffrey D. Ullman,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001, SIGA.

[18]  Hisao Tamaki,et al.  Unfold/Fold Transformation of Logic Programs , 1984, ICLP.