We have designed a system, called STRUDEL, which applies familiar concepts from database management systems, to the process of building web sites. The main motivation for developing STRUDEL is the observation that with current technology, creating and managing large sites is tedious, because a site designer must simultaneously perform (at least) three tasks: (1) choosing what information will be available at the site, (2) organizing that information in individual pages or in graphs of linked pages, and (3) specifying the visual presentation of pages in HTML. Furthermore, since there is no separation between the physical organization of the information underlying a web site and the logical view we have on it, changing or restructuring a site are unwieldy tasks. In STRUDEL, the web site manager can separate the logical view of information available at a web site, the structure of that information in linked pages, and the graphical presentation of pages in HTML. First, the site builder defines independently the data that will be available at the site. This process may require creating an integrated view of data from multiple (external) sources. Second, the site builder defines the structure of the web-site. The structure is defined as a view over the underlying information, and different versions of the site can be defined by specifying multiple views. Finally, the graphical representation of the pages in the web site is specified. This paper describes the query language that lies at the heart of the STRUDEL system. In STRUDEL, we model the da ta at the different levels as graphs. That is, the data in the external sources, the da ta in the integrated view and the web-site itself are modeled as graphs. A graph model is appropriate because site da ta may be derived from multiple sources, such as existing database systems and HTML files. Consequently, our system requires a query language for (1) defining the integrated view of the data, and (2) defining the structure of web sites. An important requirement of our query language is that it be able to construct graphs. Our query processor needs to be able to answer queries tha t involve accessing different da ta sources. Even though we model the sources as containing graphs, we cannot assume they have a uniform representation of graphs. Hence, our query processor needs to adhere to possible limitations on access to data in the graphs, and should be able to exploit additional querying capabilities that an external source may have. We have designed a general framework for processing STRUDEL queries over multiple unstructured data sources, and are designing optimizations that use the capabilities of external sources whenever possible. The purpose of this paper is to describe the syntax and semantics of STRUQL, the query language at the core of STRUDEL. We believe that STRuQL is a language of independent interest, and is useful for other applications involving the management of semistructured data, as well as a view definition language for such data. We discuss the relationship of STRUQL to other languages proposed in the li terature in Section 6: see [Abi97, Bun97].
[1]
Roy Goldman,et al.
Views for Semistructured Data
,
1997
.
[2]
Jennifer Widom,et al.
Object exchange across heterogeneous information sources
,
1995,
Proceedings of the Eleventh International Conference on Data Engineering.
[3]
Joann J. Ordille,et al.
Querying Heterogeneous Information Sources Using Source Descriptions
,
1996,
VLDB.
[4]
Peter T. Wood,et al.
Queries on graphs
,
1989
.
[5]
Alberto O. Mendelzon,et al.
Architecture and Applications of the Hy+ Visualization System
,
1994,
IBM Syst. J..
[6]
Dan Suciu,et al.
A query language and optimization techniques for unstructured data
,
1996,
SIGMOD '96.
[7]
Serge Abiteboul,et al.
Querying Semi-Structured Data
,
1997,
Encyclopedia of Database Systems.
[8]
Neil Immerman,et al.
Languages that Capture Complexity Classes
,
1987,
SIAM J. Comput..
[9]
Alberto O. Mendelzon,et al.
G+: Recursive Queries Without Recursion
,
1988,
Expert Database Conf..
[10]
Alberto O. Mendelzon,et al.
A graphical query language supporting recursion
,
1987,
SIGMOD '87.