Knowledge discovery using Web bags in a Web warehouse

Sets and bags are closely related structures. A bag is different from a set in that it is sensitive to the number of times an element occurs while a set is not. In this paper, we introduce the concept of Web bag in a Web warehouse as a part of our the WHOWEDA project. Informally, a Web bag is a Web table which allows multiple occurrences of identical Web tuples. Web bag helps to discover useful knowledge from a Web table such as visible documents (or Web sites), luminous documents and luminous paths. We formally discuss the semantics and properties Web bags, and illustrate with examples applications of Web bag in knowledge discovery in a Web warehouse.

[1]  Sourav S. Bhowmick,et al.  Web warehousing: an algebra for web information , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[2]  Ee-Peng Lim,et al.  Locating Web information using Web checkpoints , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[3]  Ee-Peng Lim,et al.  /spl Pi/-web join in a web warehouse , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[4]  Ee-Peng Lim,et al.  WEDAGEN-a synthetic Web database generator , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[5]  Sourav S. Bhowmick,et al.  Information Coupling in Web Databases , 1998, ER.

[6]  Jennifer Widom,et al.  A First Course in Database Systems , 1997 .

[7]  Tim Bray,et al.  Measuring the Web , 1996, World Wide Web J..

[8]  Sourav S. Bhowmick,et al.  Algebraic Operations on Bags in a Web Warehouse , 1999, ICSC.

[9]  Sourav S. Bhowmick,et al.  Join Processing in Web Databases , 1998, DEXA.