Web Bags - Are They Useful in A Web Warehouse?

Sets and bags are closely related structures. A bag is diierent from a set in that it is sensitive to the number of times an element occurs while a set is not. In this paper, we introduce the concept of web bag in a web warehouse as a part of our Web Information Coupling System(WICS). Informally, a web bag is a web table which allows multiple occurrences of identical web tuples. Web bag helps to discover useful knowledge from a web table such as visible documents (or web sites), luminous documents and luminous paths. We formally discuss the semantics and properties of web bags, and illustrate with examples applications of web bag in knowledge discovery in a web warehouse.

[1]  G. Moerkotte,et al.  RAW : a Relational Algebra for the Web , 1997 .

[2]  Laks V. S. Lakshmanan,et al.  A declarative language for querying and restructuring the Web , 1996, Proceedings RIDE '96. Sixth International Workshop on Research Issues in Data Engineering.

[3]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases and Webs , 1998, Proceedings 14th International Conference on Data Engineering.

[4]  Joseph Albert,et al.  Algebraic Properties of Bag Data Types , 1991, VLDB.

[5]  Tova Milo,et al.  Towards Tractable Algebras for Bags , 1996, J. Comput. Syst. Sci..

[6]  Paolo Merialdo,et al.  Semistructured and structured data in the Web: going back and forth , 1997, SGMD.

[7]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[8]  Limsoon Wong,et al.  Some Properties of Query Languages for Bags , 1993, DBPL.

[9]  Dan Suciu,et al.  A Query Language and Processor for a Web-Site Management System , 1997 .

[10]  Hamid Pirahesh,et al.  The Magic of Duplicates and Aggregates , 1990, VLDB.

[11]  Catriel Beeri,et al.  On the power of languages for manipulation of complex objects , 1987, VLDB 1987.

[12]  Sourav S. Bhowmick,et al.  Join Processing in Web Databases , 1998, DEXA.

[13]  Tim Bray,et al.  Measuring the Web , 1996, World Wide Web J..

[14]  Ming-Chien Shan,et al.  Iris: An Object-Oriented Database Management System , 1989, ACM Trans. Inf. Syst..

[15]  Sourav S. Bhowmick,et al.  Web warehousing: an algebra for web information , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[16]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[17]  Catriel Beeri,et al.  Algebraic Optimization of Object-Oriented Query Languages , 1990, Theor. Comput. Sci..

[18]  David Konopnicki,et al.  W3QS: A Query System for the World-Wide Web , 1995, VLDB.

[19]  Tova Milo,et al.  Calculi for Bags and their Complexity , 1993, DBPL.

[20]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1997, International Journal on Digital Libraries.

[21]  Dennis McLeod,et al.  Database description with SDM: a semantic database model , 1981, TODS.

[22]  Sourav S. Bhowmick,et al.  Information Coupling in Web Databases , 1998, ER.

[23]  Serge Abiteboul,et al.  Queries and computation on the web , 1997, Theor. Comput. Sci..

[24]  Jennifer Widom,et al.  A First Course in Database Systems , 1997 .

[25]  Limsoon Wong,et al.  New techniques for studying set languages, bag languages and aggregate functions , 1994, PODS '94.