Declarative Languages for Querying Portal Catalogs

As data is increasingly captured, aggregated, and digitized worldwide, new types of information systems, such as digital libraries and information (subject) gateways, emerge as core technologies of the 21st-century economy. After a rst generation of systems focusing on the accessibility of available information resources, nowadays, high quality information collections are smoothly transformed into Community Web Portals. These Portals provide the means to select, classify and access, in a semantically meaningful and ubiquitous way, diverse information resources in order to develop and maintain speci c communities of interests (e.g., professional, trading, etc.) on corporate intranets or the Web. A key Portal component is the Knowledge Catalog holding descriptive information, i.e., metadata, about the community resources (e.g., sites, documents, data, etc.). Despite the current developments in standards for describing the content and meaning of information resources (see the W3C Metadata Activity), declarative languages suitable for querying both their semantic descriptions and the employed schemas are still missing. In this paper we present such a high-level query language for Portal Catalogs (e.g., as Open Directory, CNET, XMLNews) created according to the Resource Description Framework (RDF) standard [15, 4]. RDF [15] aims at facilitating the creation and exchange of metadata as any other Web data. RDF resource descriptions are represented as directed labeled graphs (where nodes are called resources or literals and edges are called properties) which can be serialized in XML. Furthermore, RDF schema [4] vocabularies are used to de ne the labels of nodes (called classes) and edges that can be used to describe and query resources in speci c communities. These labels can be organized into appropriate taxonomies, carrying the inclusion semantics of subjects/topics in a Portal Catalog. In this context, our query language, called RQL, relies on a graph data model allowing us to interpret semistructured RDF descriptions by means of one or more RDF schemas. Note that RDF schemas (a) do not impose a strict typing on the data (by e.g., permitting multiple classi cation, optional and repeated properties); (b) can be easily extended (e.g., through specialization of both classes and property types); (c) may provide only a partial or overlapped interpretation of the underlying data (e.g., by having several, eventually incomplete schemas for the same resource descriptions); and (d) are not entirely separated from the resource descriptions (i.e., they can be queried like normal data). Thus, RQL shares the exibility and utility of the recent proposals for semistructured or XML query languages, while, at the same time, extending their functionality to the RDF schema level by exploring in a transparent way the de ned taxonomies of classes and properties, as well as, the multiple classi cation of resources. To the best of our knowledge, RQL is the rst language to smoothly combine features from thesauribased information retrieval systems (i.e., term expansion mechanisms [12]) with semistructured or XML query languages featuring variables on both property and class names (i.e., generalized path expressions [1]). Our work is motivated by the fact that existing semistructured models (e.g., OEM [18], YAT [8]) cannot capture the semantics of node and edge labels provided by RDF schemas (i.e., taxonomies of classes and property types), while semistructured or XML query languages (e.g., LOREL [2], UnQL [5], StruQL [11], XML-QL [10], XML-GL [7]) are not suited to exploit RDF schema information (i.e., pattern vs. semantic matching of labels). On the other hand, database (relational or object) schema query languages as SchemaSQL [14], XSQL [13] or Noodle [17] fail to fully accommodate RDFS

[1]  Michael Kifer,et al.  Querying object-oriented databases , 1992, SIGMOD '92.

[2]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[3]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[4]  Massimo Marchiori,et al.  Query + Metadata + Logic = Metalog , 1998, QL.

[5]  Dan Suciu,et al.  STRUDEL: a Web site management system , 1997, SIGMOD '97.

[6]  Laks V. S. Lakshmanan,et al.  SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems , 1996, VLDB.

[7]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[8]  Sophie Cluet,et al.  Your mediators need data conversion! , 1998, SIGMOD '98.

[9]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[11]  Dan Suciu,et al.  Programming Constructs for Unstructured Data , 1995, DBPL.

[12]  Kenneth A. Ross,et al.  Noodle: A Language for Declarative Querying in an Object-Oriented Database , 1993, DOOD.

[13]  R. G. G. Cattell,et al.  The Object Database Standard: ODMG-93 , 1993 .

[14]  Dan Brickley,et al.  Resource description framework (RDF) schema specification , 1998 .

[15]  Guido Moerkotte,et al.  Querying documents in object databases , 1997, International Journal on Digital Libraries.

[16]  Letizia Tanca,et al.  XML-GL: A Graphical Language for Querying and Restructuring XML Documents , 1999, SEBD.