PADS: a domain-specific language for processing ad hoc data

PADS is a declarative data description language that allows data analysts to describe both the physical layout of ad hoc data sources and semantic properties of that data. From such descriptions, the PADS compiler generates libraries and tools for manipulating the data, including parsing routines, statistical profiling tools, translation programs to produce well-behaved formats such as Xml or those required for loading relational databases, and tools for running XQueries over raw PADS data sources. The descriptions are concise enough to serve as "living" documentation while flexible enough to describe most of the ASCII, binary, and Cobol formats that we have seen in practice. The generated parsing library provides for robust, application-specific error handling.

[1]  Theodore Johnson,et al.  Gigascope: high performance network monitoring with an SQL interface , 2002, SIGMOD '02.

[2]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[3]  Balachander Krishnamurthy,et al.  Web protocols and practice , 2001 .

[4]  Philippe Fouquart,et al.  ASN.1 Communication Between Heterogeneous Systems , 2000 .

[5]  Corinna Cortes,et al.  Information mining platforms: an infrastructure for KDD rapid deployment , 1999, KDD '99.

[6]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.1 , 1997, RFC.

[7]  Craig E. Wills,et al.  Improving Web experience by client characterization driven server adaptation , 2001, WWW 2001.

[8]  Balachander Krishnamurthy,et al.  On network-aware clustering of Web clients , 2000, SIGCOMM.

[9]  David G. Korn,et al.  SFIO: Safe/Fast String/File IO , 1991, USENIX Summer.

[10]  Sudipto Guha,et al.  Fast, small-space algorithms for approximate histogram maintenance , 2002, STOC '02.

[11]  Tim Sheard,et al.  A software engineering experiment in software component generation , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[12]  Balachander Krishnamurthy,et al.  Improving web performance by client characterization driven server adaptation , 2002, WWW '02.

[13]  Satish Chandra,et al.  Packet Types: Abstract specifications of network protocol messages , 2000, SIGCOMM.

[14]  Godmar Back,et al.  DataScript - A Specification and Scripting Language for Binary Data , 2002, GPCE.

[15]  S. Boag,et al.  XQuery 1.0 : An XML query language, W3C Working Draft 12 November 2003 , 2003 .

[16]  S. Muthukrishnan,et al.  How to Summarize the Universe: Dynamic Maintenance of Quantiles , 2002, VLDB.

[17]  David G. Korn,et al.  The AT&T AST OpenSource Software Collection , 2000, USENIX Annual Technical Conference, FREENIX Track.

[18]  Anne Rogers,et al.  Hancock: A language for analyzing transactional data streams , 2004, TOPL.

[19]  Sudipto Guha,et al.  Histogramming Data Streams with Fast Per-Item Processing , 2002, ICALP.

[20]  Rick Greer,et al.  Daytona and the fourth-generation language Cymbal , 1999, SIGMOD '99.

[21]  Tong Zhou,et al.  Software design for reliability and reuse: a proof-of-concept demonstration , 1994, TRI-Ada '94.

[22]  Daryl Pregibon,et al.  Giga-Mining , 1998, KDD.

[23]  Amélie Marian,et al.  Implementing Xquery 1.0: The Galax Experience , 2003, VLDB.

[24]  Satish Chandra,et al.  Packet types: abstract specification of network protocol messages , 2000 .