The next 700 data description languages

In the spirit of Landin, we present a calculus of dependent types to serve as the semantic foundation for a family of languages called data description languages. Such languages, which include pads, datascript, and packettypes, are designed to facilitate programming with ad hoc data, that is, data not in well-behaved relational or xml formats. In the calculus, each type describes the physical layout and semantic properties of a data source. In the semantics, we interpret types simultaneously as the in-memory representation of the data described and as parsers for the data source. The parsing functions are robust, automatically detecting and recording errors in the data stream without halting parsing. We show the parsers are type-correct, returning data whose type matches the simple-type interpretation of the specification. We also prove the parsers are “error-correct,” accurately reporting the number of physical and semantic errors that occur in the returned data. We use the calculus to describe the features of various data description languages, and we discuss how we have used the calculus to improve pads.

[1]  Laurie J. Hendren,et al.  SableCC, an object-oriented compiler framework , 1998, Proceedings. Technology of Object-Oriented Languages. TOOLS 26 (Cat. No.98EX176).

[2]  Bryan Ford,et al.  Parsing expression grammars: a recognition-based syntactic foundation , 2004, POPL '04.

[3]  David Walker,et al.  The next 700 data description languages , 2006, POPL '06.

[4]  David Walker,et al.  PADS/ML: a functional data description language , 2007, POPL '07.

[5]  Karl J. Lieberherr,et al.  Object-oriented programming with class dictionaries , 1988, LISP Symb. Comput..

[6]  Graham Hutton,et al.  Monadic parsing in Haskell , 1998, Journal of Functional Programming.

[7]  Yong Zhao,et al.  XDTM: The XML Data Type and Mapping for Specifying Datasets , 2005, EGC.

[8]  Amélie Marian,et al.  Implementing Xquery 1.0: The Galax Experience , 2003, VLDB.

[9]  Robert Harper Programming Languages: Theory and Practice , 2005 .

[10]  Ralf Hinze,et al.  A new approach to generic functional programming , 2000, POPL '00.

[11]  Wouter Swierstra,et al.  The power of Pi , 2008, ICFP 2008.

[12]  Johan Jeuring,et al.  PolyP—a polytypic programming language extension , 1997, POPL '97.

[13]  S. Doaitse Swierstra,et al.  Fast, Error Correcting Parser Combinatiors: A Short Tutorial , 1999, SOFSEM.

[14]  Richard S. Bird,et al.  Nested Datatypes , 1998, MPC.

[15]  John C. Reynolds,et al.  Towards a theory of type structure , 1974, Symposium on Programming.

[16]  Eelco Visser Polymorphic Syntax Definition , 1998, Theor. Comput. Sci..

[17]  Bryan Ford,et al.  Packrat parsing:: simple, powerful, lazy, linear time, functional pearl , 2002, ICFP '02.

[18]  J. Girard Une Extension De ĽInterpretation De Gödel a ĽAnalyse, Et Son Application a ĽElimination Des Coupures Dans ĽAnalyse Et La Theorie Des Types , 1971 .

[19]  Patrik Jansson Functional Polytypic Programming , 2000 .

[20]  Konstantinos Sagonas,et al.  Adaptive Pattern Matching on Binary Data , 2004, ESOP.

[21]  Philip Wadler,et al.  Featherweight Java: a minimal core calculus for Java and GJ , 1999, OOPSLA '99.

[22]  Marinus J. Plasmeijer,et al.  Polytypic Syntax Tree Operations , 2005, IFL.

[23]  Daan Leijen,et al.  Parsec: direct style monadic parser combinators for the real world , 2001 .

[24]  Graham Hutton,et al.  Higher-order functions for parsing , 1992, Journal of Functional Programming.

[25]  Johan Jeuring,et al.  Polytypic Compact Printing and Parsing , 1999, ESOP.

[26]  David Walker,et al.  The theory and practice of data description , 2006 .

[27]  William H. Burge,et al.  Recursive Programming Techniques , 1975 .

[28]  Richard A. Frost,et al.  Parser Combinators for Ambiguous Left-Recursive Grammars , 2008, PADL.

[29]  Yong Zhao,et al.  A notation and system for expressing and executing cleanly typed workflows on messy scientific data , 2005, SGMD.

[30]  Eelco Visser,et al.  Syntax definition for language prototyping , 1997 .

[31]  Johan Jeuring,et al.  Polytypic Programming , 1996, Advanced Functional Programming.

[32]  P. J. Landin,et al.  The next 700 programming languages , 1966, CACM.

[33]  Simon L. Peyton Jones,et al.  Scrap your boilerplate: a practical design pattern for generic programming , 2003, TLDI '03.

[34]  Ralf Hinze Generics for the masses , 2004, ICFP '04.

[35]  Bryan Ford,et al.  Packrat Parsing: Simple, Powerful, Lazy, Linear Time , 2006, ICFP 2002.

[36]  Jeroen D. Fokker,et al.  Functional Parsers , 1995, Advanced Functional Programming.

[37]  Philip Wadler,et al.  Packrat parsing:: simple, powerful, lazy, linear time, functional pearl , 2002, ICFP '02.

[38]  Godmar Back,et al.  DataScript - A Specification and Scripting Language for Binary Data , 2002, GPCE.

[39]  Jeffrey D. Ullman,et al.  Parsing Algorithms with Backtrack , 1973, Inf. Control..

[40]  S. Doaitse Swierstra,et al.  Deterministic, Error-Correcting Combinator Parsers , 1996, Advanced Functional Programming.

[41]  S. Doaitse Swierstra Combinator Parsers - From Toys to Tools , 2000, Electron. Notes Theor. Comput. Sci..

[42]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[43]  Balachander Krishnamurthy,et al.  Web protocols and practice , 2001 .

[44]  Claus Brabrand,et al.  Dual Syntax for XML Languages , 2005, DBPL.

[45]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[46]  S. Doaitse Swierstra,et al.  Polish parsers, step by step , 2003, ICFP '03.

[47]  Satish Chandra,et al.  Packet Types: Abstract specifications of network protocol messages , 2000, SIGCOMM.

[48]  Johan Jeuring,et al.  Polytypic data conversion programs , 2002, Sci. Comput. Program..

[49]  Robert Gruber,et al.  PADS: a domain-specific language for processing ad hoc data , 2005, PLDI '05.

[50]  Philippe Fouquart,et al.  ASN.1 Communication Between Heterogeneous Systems , 2000 .

[51]  Russell W. Quong,et al.  ANTLR: A predicated‐LL(k) parser generator , 1995, Softw. Pract. Exp..

[52]  Jan A. Bergstra,et al.  Algebraic specification , 1989 .