What can a web bag discover for you?

Sets and bags are closely related structures. A bag is different from a set in that it is sensitive to the number of times an element occurs while a set is not. In this paper, we introduce the concept of web bag in the context of the web warehouse project called WHOWEDA (Warehouse of Web Data). Informally, a web bag is a web table which allows multiple occurrences of identical web tuples. We have used web bag to discover useful knowledge from a web table such as visible documents (or web sites), luminous documents and luminous paths. In this paper, we formally discuss the semantics and properties of web bags. We design formal algorithms for the construction of a web bag and its schema. In addition, we also provide formal algorithms for various types of knowledge discovery in a web warehouse using web bag and illustrate them with examples.

[1]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[2]  Sourav S. Bhowmick,et al.  Join Processing in Web Databases , 1998, DEXA.

[3]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[4]  Sourav S. Bhowmick,et al.  Information Coupling in Web Databases , 1998, ER.

[5]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases and Webs , 1998, Proceedings 14th International Conference on Data Engineering.

[6]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[7]  Tova Milo,et al.  Towards Tractable Algebras for Bags , 1996, J. Comput. Syst. Sci..

[8]  Hamid Pirahesh,et al.  The Magic of Duplicates and Aggregates , 1990, VLDB.

[9]  Laks V. S. Lakshmanan,et al.  A declarative language for querying and restructuring the Web , 1996, Proceedings RIDE '96. Sixth International Workshop on Research Issues in Data Engineering.

[10]  Sourav S. Bhowmick,et al.  Web Bags - Are They Useful in A Web Warehouse? , 1998, FODO.

[11]  Tim Bray,et al.  Measuring the Web , 1996, World Wide Web J..

[12]  Jennifer Widom,et al.  A First Course in Database Systems , 1997 .

[13]  Ming-Chien Shan,et al.  Iris: An Object-Oriented Database Management System , 1989, ACM Trans. Inf. Syst..

[14]  Sourav S. Bhowmick,et al.  Web warehousing: an algebra for web information , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[15]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[16]  Serge Abiteboul,et al.  Queries and computation on the web , 1997, Theor. Comput. Sci..

[17]  Dan Suciu,et al.  A Query Language and Processor for a Web-Site Management System , 1997 .

[18]  G. Moerkotte,et al.  RAW : a Relational Algebra for the Web , 1997 .

[19]  Joseph Albert,et al.  Algebraic Properties of Bag Data Types , 1991, VLDB.

[20]  Paolo Merialdo,et al.  Semistructured and structured data in the Web: going back and forth , 1997, SGMD.

[21]  K. Selçuk Candan,et al.  WebDB: a Web query system and its modeling, language, and implementation , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[22]  Sourav S. Bhowmick,et al.  Web Warehousing: Design and Issues , 1998, ER Workshops.

[23]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[24]  Miao Liu,et al.  Structure-Based Queries over the World Wide Web , 1998, ER.

[25]  David Konopnicki,et al.  W3QS: A Query System for the World-Wide Web , 1995, VLDB.

[26]  Limsoon Wong,et al.  Query Languages for Bags , 1993 .

[27]  Allison Woodruff,et al.  An Investigation of Documents from the World Wide Web , 1996, Comput. Networks.

[28]  Catriel Beeri,et al.  On the power of languages for manipulation of complex objects , 1987, VLDB 1987.

[29]  Michael Stonebraker,et al.  The Asilomar report on database research , 1998, SGMD.

[30]  Sourav S. Bhowmick,et al.  Cost-benefit analysis of web bag in a web warehouse: An analytical approach , 2004, World Wide Web.