Anatomy of the coupling query in a web warehouse

Abstract To populate a data warehouse specifically designed for Web data, i.e. web warehouse, it is imperative to harness relevant documents from the Web. In this paper, we describe a query mechanism called coupling query to glean relevant Web data in the context of our web warehousing system called Warehouse Of Web Data (WHOWEDA). Coupling query may be used for querying both HTML and XML documents. Some of the important features of our query mechanism are ability to query metadata, content, internal and external (hyperlink) structure of Web documents based on partial knowledge, ability to express constraints on tag attributes and tagless segment of data, ability to express conjunctive as well as disjunctive query conditions compactly, ability to control execution of a web query and preservation of the topological structure of hyperlinked documents in the query results. We also discuss how to formulate query graphically and in textual form using coupling graph and coupling text, respectively.

[1]  Sourav S. Bhowmick,et al.  On Formulation of Disjunctive Coupling Queries in WHOWEDA , 2001, DEXA.

[2]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[3]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[4]  Sophie Cluet,et al.  Your mediators need data conversion! , 1998, SIGMOD '98.

[5]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases and Webs , 1998, Proceedings 14th International Conference on Data Engineering.

[6]  Paolo Merialdo,et al.  Design and Maintenance of Data-Intensive Web Sites , 1998, EDBT.

[7]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[8]  Paolo Paolini,et al.  A Conceptual Model and a Tool Environment for Developing More Scalable, Dynamic, and Customizable Web Applications , 1998, EDBT.

[9]  Bertram Ludäscher,et al.  On a Declarative Semantics for Web Queries , 1997, DOOD.

[10]  Sourav S. Bhowmick,et al.  Data Visualization Operators for WHOWEDA , 2000, Comput. J..

[11]  Laks V. S. Lakshmanan,et al.  A declarative language for querying and restructuring the Web , 1996, Proceedings RIDE '96. Sixth International Workshop on Research Issues in Data Engineering.

[12]  Sourav S. Bhowmick,et al.  WHOM: a data model and algebra for a web warehouse , 2001 .

[13]  Serge Abiteboul,et al.  Queries and computation on the web , 1997, Theor. Comput. Sci..

[14]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[15]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[16]  Sourav S. Bhowmick,et al.  Schemas for web data: a reverse engineering approach , 2001, Data Knowl. Eng..

[17]  Sourav S. Bhowmick,et al.  Web Warehousing: Design and Issues , 1998, ER Workshops.

[18]  Roy Goldman,et al.  From Semistructured Data to XML: Migrating the Lore Data Model and Query Language , 1999, Markup Lang..

[19]  G. Moerkotte,et al.  RAW : a Relational Algebra for the Web , 1997 .

[20]  Ee-Peng Lim,et al.  Locating Web information using Web checkpoints , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[21]  M. Kifer,et al.  F-logic : A "Higher-Order" Logic for Reasoning about Objects, Inheritance, and Scheme , 1989, ACM SIGMOD Conference.

[22]  Bertram Ludäscher,et al.  Managing Semistructured Data with FLORID: A Deductive Object-Oriented Perspective , 1998, Inf. Syst..

[23]  David W. Embley,et al.  Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages , 1999, Data Knowl. Eng..

[24]  Tok Wang Ling,et al.  A Conceptual Model and Rule-Based Query Language for HTML , 2001, World Wide Web.

[25]  Paolo Merialdo,et al.  Efficient Queries over Web Views , 1998, EDBT.

[26]  David Konopnicki,et al.  Information gathering in the World-Wide Web: the W3QL query language and the W3QS system , 1998, TODS.