Edinburgh Research Explorer Union Types for Semistructured Data

Semistructured databases are treated as dynamically typed: they come equipped with no independent schema or type system to constrain the data. Query languages that are designed for semistructured data, even when used with structured data, typically ignore any type information that may be present. The consequences of this are what one would expect from using a dynamic type system with complex data: fewer guarantees on the correctness of applications. For example, a query that would cause a type error in a statically typed query language will return the empty set when applied to a semistructured representation of the same data. Much semistructured data originates in structured data. A semistructured representation is useful when one wants to add data that does not conform to the original type or when one wants to combine sources of di(cid:11)erent types. However, the deviations from the prescribed types are often minor, and we believe that a better strategy than throwing away all type information is to preserve as much of it as possible. We describe a system of untagged union types that can accommodate variations in structure while still allowing a degree of static type checking. A novelty of this system is that it involves non-trivial equivalences among types, arising from a law of distributivity for records and unions: a value may be introduced with one type (e.g., a record containing a union) and used at another type (a union of records). We describe programming and query language constructs for dealing with such types, prove the soundness of the type system, and develop algorithms for subtyping and typechecking.

[1]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[2]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[3]  Serge Abiteboul,et al.  Inferring structure in semistructured data , 1997, SGMD.

[4]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[5]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[6]  Mariangiola Dezani-Ciancaglini,et al.  Filter models for conjunctive-disjunctive l-calculi , 1996 .

[7]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[8]  Limsoon Wong,et al.  Principles of Programming with Complex Objects and Collection Types , 1995, Theor. Comput. Sci..

[9]  Mariangiola Dezani-Ciancaglini,et al.  Intersection and Union Types: Syntax and Semantics , 1995, Inf. Comput..

[10]  Tova Milo,et al.  Optimizing queries on files , 1994, SIGMOD '94.

[11]  Flemming M. Damm,et al.  Subtyping with Union Types, Intersection Types and Recursive Types , 1994, TACS.

[12]  Jerzy Tiuryn,et al.  An analysis of ML typability , 1994, JACM.

[13]  Susumu Hayashi,et al.  Singleton, Union and Intersection Types for Program Extraction , 1994, Inf. Comput..

[14]  B. Pierce Programming with intersection types, union types, and polymorphism , 1991 .

[15]  Luca Cardelli. Amber Combinators and Functional Programming Languages , 1985, Lecture Notes in Computer Science.

[16]  Serge Abiteboul,et al.  IFO: a formal semantic database model , 1987, TODS.