The next 700 data description languages

In the spirit of Landin, we present a calculus of dependent types to serve as the semantic foundation for a family of languages called data description languages. Such languages, which include pads, datascript, and packettypes, are designed to facilitate programming with ad hoc data, ie, data not in well-behaved relational or xml formats. In the calculus, each type describes the physical layout and semantic properties of a data source. In the semantics, we interpret types simultaneously as the in-memory representation of the data described and as parsers for the data source. The parsing functions are robust, automatically detecting and recording errors in the data stream without halting parsing. We show the parsers are type-correct, returning data whose type matches the simple-type interpretation of the specification. We also prove the parsers are "error-correct," accurately reporting the number of physical and semantic errors that occur in the returned data. We use the calculus to describe the features of various data description languages, and we discuss how we have used the calculus to improve PADS.

[1]  Godmar Back,et al.  DataScript - A Specification and Scripting Language for Binary Data , 2002, GPCE.

[2]  Wouter Swierstra,et al.  The power of Pi , 2008, ICFP.

[3]  Bryan Ford,et al.  Parsing expression grammars: a recognition-based syntactic foundation , 2004, POPL '04.

[4]  Robert Grimm rgrimm Practical Packrat Parsing , 2004 .

[5]  Johan Jeuring,et al.  Polytypic Compact Printing and Parsing , 1999, ESOP.

[6]  Eelco Visser,et al.  Syntax definition for language prototyping , 1997 .

[7]  Johan Jeuring,et al.  Polytypic Programming , 1996, Advanced Functional Programming.

[8]  Yong Zhao,et al.  A notation and system for expressing and executing cleanly typed workflows on messy scientific data , 2005, SGMD.

[9]  Simon L. Peyton Jones,et al.  Scrap your boilerplate: a practical design pattern for generic programming , 2003, TLDI '03.

[10]  Karl J. Lieberherr,et al.  Object-oriented programming with class dictionaries , 1988, LISP Symb. Comput..

[11]  Graham Hutton,et al.  Monadic parsing in Haskell , 1998, Journal of Functional Programming.

[12]  Marinus J. Plasmeijer,et al.  Polytypic Syntax Tree Operations , 2005, IFL.

[13]  Laurie J. Hendren,et al.  SableCC, an object-oriented compiler framework , 1998, Proceedings. Technology of Object-Oriented Languages. TOOLS 26 (Cat. No.98EX176).

[14]  Ralf Hinze Generics for the masses , 2004, ICFP '04.

[15]  Richard A. Frost,et al.  Parser Combinators for Ambiguous Left-Recursive Grammars , 2008, PADL.

[16]  P. J. Landin,et al.  The next 700 programming languages , 1966, CACM.

[17]  Richard S. Bird,et al.  Nested Datatypes , 1998, MPC.

[18]  John C. Reynolds,et al.  Towards a theory of type structure , 1974, Symposium on Programming.

[19]  Midori A. Harris,et al.  The Gene Ontology project , 2005 .

[20]  Robert Harper Programming Languages: Theory and Practice , 2005 .

[21]  Ralf Hinze,et al.  A new approach to generic functional programming , 2000, POPL '00.

[22]  Philip Wadler,et al.  Featherweight Java: a minimal core calculus for Java and GJ , 2001, TOPL.

[23]  J. Y. Girard,et al.  Interpretation fonctionelle et elimination des coupures dans l'aritmetique d'ordre superieur , 1972 .

[24]  Jeffrey D. Ullman,et al.  Parsing Algorithms with Backtrack , 1970, SWAT.

[25]  Bryan Ford,et al.  Packrat Parsing: Simple, Powerful, Lazy, Linear Time , 2006, ICFP 2002.

[26]  Jeroen D. Fokker,et al.  Functional Parsers , 1995, Advanced Functional Programming.

[27]  Satish Chandra,et al.  Packet types: abstract specification of network protocol messages , 2000 .

[28]  S. Doaitse Swierstra,et al.  Fast, Error Correcting Parser Combinatiors: A Short Tutorial , 1999, SOFSEM.

[29]  Satish Chandra,et al.  Packet Types: Abstract specifications of network protocol messages , 2000, SIGCOMM.

[30]  David Walker,et al.  The theory and practice of data description , 2006 .

[31]  S. Doaitse Swierstra Combinator Parsers - From Toys to Tools , 2000, Electron. Notes Theor. Comput. Sci..

[32]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[33]  Russell W. Quong,et al.  ANTLR: A predicated‐LL(k) parser generator , 1995, Softw. Pract. Exp..

[34]  Jan A. Bergstra,et al.  Algebraic specification , 1989 .

[35]  David Walker,et al.  PADS/ML: a functional data description language , 2007, POPL '07.

[36]  Yong Zhao,et al.  XDTM: The XML Data Type and Mapping for Specifying Datasets , 2005, EGC.

[37]  Amélie Marian,et al.  Implementing Xquery 1.0: The Galax Experience , 2003, VLDB.

[38]  Johan Jeuring,et al.  PolyP—a polytypic programming language extension , 1997, POPL '97.

[39]  S. Doaitse Swierstra,et al.  Deterministic, Error-Correcting Combinator Parsers , 1996, Advanced Functional Programming.

[40]  Otto C. Juelich,et al.  On the recursive programming techniques , 1964, CACM.

[41]  Eelco Visser Polymorphic Syntax Definition , 1998, Theor. Comput. Sci..

[42]  Johan Jeuring,et al.  Polytypic data conversion programs , 2002, Sci. Comput. Program..

[43]  Daan Leijen,et al.  Parsec: direct style monadic parser combinators for the real world , 2001 .

[44]  Graham Hutton,et al.  Higher-order functions for parsing , 1992, Journal of Functional Programming.

[45]  Bryan Ford,et al.  Packrat parsing:: simple, powerful, lazy, linear time, functional pearl , 2002, ICFP '02.

[46]  Patrik Jansson Functional Polytypic Programming , 2000 .

[47]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[48]  Balachander Krishnamurthy,et al.  Web protocols and practice , 2001 .

[49]  Claus Brabrand,et al.  Dual Syntax for XML Languages , 2005, DBPL.

[50]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[51]  S. Doaitse Swierstra,et al.  Polish parsers, step by step , 2003, ICFP '03.

[52]  Claus Brabrand,et al.  Dual syntax for XML languages , 2005, Inf. Syst..

[53]  David T Eger Bit Level Types , 2005 .

[54]  Robert Gruber,et al.  PADS: a domain-specific language for processing ad hoc data , 2005, PLDI '05.

[55]  Philippe Fouquart,et al.  ASN.1 Communication Between Heterogeneous Systems , 2000 .

[56]  Konstantinos Sagonas,et al.  Adaptive Pattern Matching on Binary Data , 2004, ESOP.

[57]  Philip Wadler,et al.  Featherweight Java: a minimal core calculus for Java and GJ , 1999, OOPSLA '99.