Management of semistructured data

A huge amount of data is available today on the Internet, or on the private Intranets of many companies. This data is structured in a multitude of ways. At an extreme we find data coming from traditional relational or object-oriented databases, with a completely known structure. At another extreme we have data which is fully unstructured, such as images, sounds, and raw text. But most of the data falls somewhere in between these two extremes, for a variety of reasons: the data may be structured, but the structure is not know to the user; the user may know the structure, but chooses to ignore it, for browsing purposes; the structure may be implicit, such as in formatted text, and is not as rigid and regular as in traditional databases; the data may be in non-traditional formats, such as the ASN.1 exchange format; the schema of the data is huge and changes often, so that we may prefer to ignore it. Several researchers have worked recently on problems related to data fitting this description, and have coined the term semistructured data for it. Two recent tutorials [Abi97, Bun97] contain an excellent introduction to semistructured data and a comprehensive bibliography on this new research topic.

[1]  Dan Suciu,et al.  Optimizing regular path expressions using graph schemas , 1998, Proceedings 14th International Conference on Data Engineering.

[2]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1997, International Journal on Digital Libraries.

[3]  Yannis Papakonstantinou,et al.  Object Fusion in Mediator Systems , 1996, VLDB.

[4]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[5]  Serge Abiteboul,et al.  Regular path queries with constraints , 1997, J. Comput. Syst. Sci..

[6]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[7]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[8]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[9]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases and Webs , 1998, Proceedings 14th International Conference on Data Engineering.

[11]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[12]  Jeffrey D. Ullman,et al.  Representative objects: concise representations of semistructured, hierarchical data , 1997, Proceedings 13th International Conference on Data Engineering.

[13]  Dan Suciu,et al.  Programming Constructs for Unstructured Data , 1995, DBPL.

[14]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.