Union Types for Semistructured Data

Semistructured databases are treated as dynamically typed: they come equipped with no independent schema or type system to constrain the data. Query languages that are designed for semistructured data, even when used with structured data, typically ignore any type information that may be present. The consequences of this are what one would expect from using a dynamic type system with complex data: fewer guarantees on the correctness of applications. For example, a query that would cause a type error in a statically typed query language will return the empty set when applied to a semistructured representation of the same data. Much semistructured data originates in structured data. A semistructured representation is useful when one wants to add data that does not conform to the original type or when one wants to combine sources of different types. However, the deviations from the prescribed types are often minor, and we believe that a better strategy than throwing away all type information is to preserve as much of it as possible. We describe a system of untagged union types that can accommodate variations in structure while still allowing a degree of static type checking. A novelty of this system is that it involves non-trivial equivalences among types, arising from a law of distributivity for records and unions: a value may be introduced with one type (e.g., a record containing a union) and used at another type (a union of records). We describe programming and query language constructs for dealing with such types, prove the soundness of the type system, and develop algorithms for subtyping and typechecking.

[1]  Flemming M. Damm,et al.  Subtyping with Union Types, Intersection Types and Recursive Types , 1994, TACS.

[2]  Limsoon Wong,et al.  Principles of Programming with Complex Objects and Collection Types , 1995, Theor. Comput. Sci..

[3]  Tova Milo,et al.  Optimizing queries on files , 1994, SIGMOD '94.

[4]  Serge Abiteboul,et al.  IFO: a formal semantic database model , 1987, TODS.

[5]  Jerzy Tiuryn,et al.  An analysis of ML typability , 1994, JACM.

[6]  Mariangiola Dezani-Ciancaglini,et al.  Filter models for conjunctive-disjunctive l-calculi , 1996 .

[7]  Serge Abiteboul,et al.  IFO: A Formal Semantic Database Model , 1987, ACM Trans. Database Syst..

[8]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[9]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[10]  Luca Cardelli. Amber Combinators and Functional Programming Languages , 1985, Lecture Notes in Computer Science.

[11]  Bernard Robinet,et al.  Combinators and functional programming languages : Thirteenth Spring School of the LITP, Val d'Ajol, France, May 6-10, 1985, proceedings , 1986 .

[12]  Dan Suciu,et al.  Comprehension syntax , 1994, SGMD.

[13]  Mariangiola Dezani-Ciancaglini,et al.  Intersection and Union Types: Syntax and Semantics , 1995, Inf. Comput..

[14]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[15]  Susumu Hayashi Singleton, Union and Intersection Types for Program Extraction , 1994, Inf. Comput..

[16]  B. Pierce Programming with intersection types, union types, and polymorphism , 1991 .

[17]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[18]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[19]  Serge Abiteboul,et al.  Inferring structure in semistructured data , 1997, SGMD.