Bounded Seas - - Island Parsing Without Shipwrecks

Imprecise manipulation of source code (semi-parsing) is useful for tasks such as robust parsing, error recovery, lexical analysis, and rapid development of parsers for data extraction. An island grammar precisely defines only a subset of a language syntax (islands), while the rest of the syntax (water) is defined imprecisely. Usually, water is defined as the negation of islands. Albeit simple, such a definition of water is naive and impedes composition of islands. When developing an island grammar, sooner or later a programmer has to create water tailored to each individual island. Such an approach is fragile, however, because water can change with any change of a grammar. It is time-consuming, because water is defined manually by a programmer and not automatically. Finally, an island surrounded by water cannot be reused because water has to be defined for every grammar individually. In this paper we propose a new technique of island parsing - bounded seas. Bounded seas are composable, robust, reusable and easy to use because island-specific water is created automatically. We integrated bounded seas into a parser combinator framework as a demonstration of their composability and reusability.

[1]  RAINER KOPPLER,et al.  A Systematic Approach to Fuzzy Parsing , 1997, Softw. Pract. Exp..

[2]  Masaru Tomita,et al.  Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems , 1985 .

[3]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[4]  Leon Moonen,et al.  Generating robust parsers using island grammars , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[5]  Peter R.J. Asveld,et al.  A Fuzzy Approach to Erroneous Inputs in Context-Free Language Recognition , 1995, IWPT.

[6]  E. Meijer,et al.  Monadic parser combinators , 1996 .

[7]  Ceriel J. H. Jacobs,et al.  Deterministic Top-Down Parsing , 2008 .

[8]  Oscar Nierstrasz,et al.  Practical Dynamic Grammars for Dynamic Languages , 2010 .

[9]  John Launchbury,et al.  Constructing Natural Language Interpreters in a Lazy Functional Language , 1989, Comput. J..

[10]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[11]  Adrian Johnstone,et al.  GLL Parsing , 2010, LDTA.

[12]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[13]  Ralf Lämmel,et al.  Deriving tolerant grammars from a base-line grammar , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[14]  Walter R. Bischofberger,et al.  Sniff—A Pragmatic Approach to a C++ Programming Environment 1 , 1992 .

[15]  Alon Lavie,et al.  GLR* – An Efficient Noise-skipping Parsing Algorithm For Context Free Grammars , 1993, IWPT.

[16]  Rainer Koppler A Systematic Approach to Fuzzy Parsing , 1997 .

[17]  Bryan Ford,et al.  Parsing expression grammars: a recognition-based syntactic foundation , 2004, POPL '04.

[18]  Oscar Nierstrasz,et al.  The story of moose: an agile reengineering environment , 2005, ESEC/FSE-13.