Bounded seas

Imprecise manipulation of source code (semi-parsing) is useful for tasks such as robust parsing, error recovery, lexical analysis, and rapid development of parsers for data extraction. An island grammar precisely defines only a subset of a language syntax (islands), while the rest of the syntax (water) is defined imprecisely.Usually water is defined as the negation of islands. Albeit simple, such a definition of water is naive and impedes composition of islands. When developing an island grammar, sooner or later a language engineer has to create water tailored to each individual island. Such an approach is fragile, because water can change with any change of a grammar. It is time-consuming, because water is defined manually by an engineer and not automatically. Finally, an island surrounded by water cannot be reused because water has to be defined for every grammar individually.In this paper we propose a new technique of island parsing - bounded seas. Bounded seas are composable, robust, reusable and easy to use because island-specific water is created automatically. Our work focuses on applications of island parsing to data extraction from source code. We have integrated bounded seas into a parser combinator framework as a demonstration of their composability and reusability. HighlightsTraditional island grammars are difficult to define and are not flexible enough.Bounded seas - a new technique of island parsing - are composable, robust, reusable and easy to define.Bounded seas are specified using our extension of parsing expression grammars.Parsers utilizing bounded seas require less effort to define and provide both good precision and performance in the two performed case studies.

[1]  RAINER KOPPLER A Systematic Approach to Fuzzy Parsing , 1997, Softw. Pract. Exp..

[2]  Ralf Lämmel,et al.  Deriving tolerant grammars from a base-line grammar , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[3]  Masaru Tomita,et al.  Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems , 1985 .

[4]  Eelco Visser,et al.  Disambiguation Filters for Scannerless Generalized LR Parsers , 2002, CC.

[5]  Eelco Visser,et al.  Natural and Flexible Error Recovery for Generated Modular Language Environments , 2012, TOPL.

[6]  Filippo Ricca,et al.  Proceedings CSMR-WCRE 2014 : IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering : Software Evolution Week , 2014 .

[7]  Oscar Nierstrasz,et al.  The story of moose: an agile reengineering environment , 2005, ESEC/FSE-13.

[8]  Bryan Ford,et al.  Packrat parsing:: simple, powerful, lazy, linear time, functional pearl , 2002, ICFP '02.

[9]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[10]  Leon Moonen,et al.  Generating robust parsers using island grammars , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[11]  Peter R.J. Asveld,et al.  A Fuzzy Approach to Erroneous Inputs in Context-Free Language Recognition , 1995, IWPT.

[12]  Vadim Zaytsev,et al.  Formal foundations for semi-parsing , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[13]  Bryan Ford,et al.  Parsing expression grammars: a recognition-based syntactic foundation , 2004, POPL '04.

[14]  Michael W. Godfrey,et al.  Reading Beside the Lines: Indentation as a Proxy for Complexity Metric , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[15]  Ceriel J. H. Jacobs,et al.  Parsing Techniques - A Practical Guide , 2007, Monographs in Computer Science.

[16]  John Launchbury,et al.  Constructing Natural Language Interpreters in a Lazy Functional Language , 1989, Comput. J..

[17]  Alon Lavie,et al.  In Recent Advances in Parsing Technology Glr* { an Eecient Noise-skipping Parsing Algorithm for Context Free Grammars , 2007 .

[18]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[19]  Eelco Visser,et al.  Using Filters for the Disambiguation of Context-free Grammars , 1994 .

[20]  Jan Kurs,et al.  PetitParser: Building Modular Parsers , 2013 .

[21]  Ceriel J. H. Jacobs,et al.  Deterministic Top-Down Parsing , 2008 .

[22]  Torbjörn Ekman,et al.  Practical Scope Recovery Using Bridge Parsing , 2008, SLE.

[23]  Oscar Nierstrasz,et al.  Bounded Seas - - Island Parsing Without Shipwrecks , 2014, SLE.

[24]  Oscar Nierstrasz,et al.  Practical Dynamic Grammars for Dynamic Languages , 2010 .

[25]  Alon Lavie,et al.  GLR* – An Efficient Noise-skipping Parsing Algorithm For Context Free Grammars , 1993, IWPT.

[26]  Eelco Visser,et al.  Scannerless Generalized-LR Parsing , 1997 .

[27]  Bryan Ford,et al.  Packet parsing : a practical linear-time algorithm with backtracking , 2002 .

[28]  Adrian Johnstone,et al.  GLL Parsing , 2010, LDTA.

[29]  P. J. Landin,et al.  The next 700 programming languages , 1966, CACM.

[30]  Walter R. Bischofberger,et al.  Sniff—A Pragmatic Approach to a C++ Programming Environment 1 , 1992 .

[31]  Kevin A. Schneider,et al.  Agile Parsing in TXL , 2004, Automated Software Engineering.

[32]  E. Meijer,et al.  Monadic parser combinators , 1996 .