PADX: Querying Large-scale Ad Hoc Data with XQuery

This paper describes our experience designing and implementing PADX, a system for querying large-scale ad hoc data sources with XQuery. PADX is the synthesis and extension of two existing systems: PADS and Galax. With PADX, an analyst writes a declarative data description of the physical layout of her ad hoc data, and the PADS compiler produces customizable libraries for parsing the data and for viewing it as XML. The resulting library is linked with an XQuery engine, permitting the analyst to view and query her ad hoc data sources using XQuery.