Web Warehousing: Design and Issues

The World Wide Web is a distributed global information resource. It contains a large amount of information that have been placed on the web independently by different organizations and thus, related information may appear across different web sites. To manage and access heterogeneous information on WWW, we have started a project of building a web warehouse, called Whoweda (Warehouse of Web Data). Currently, our work on building a web warehousing system has focused on building a data model and designing a web algebra. In this paper, we discuss design and research issues in a web warehousing system. The issues include are designing algebraic operators for web information access and manipulation, web data visualization and web knowledge discovery. These issues will not only overcome the limitations of available search engines but also provide powerful and friendly query mechanisms for retrieving useful information and knowledge discovery from a web warehouse.

[1]  Dan Suciu,et al.  A query language for a Web-site management system , 1997, SGMD.

[2]  Sourav S. Bhowmick,et al.  Web Bags - Are They Useful in A Web Warehouse? , 1998, FODO.

[3]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases and Webs , 1998, Proceedings 14th International Conference on Data Engineering.

[4]  Sourav S. Bhowmick,et al.  Join Processing in Web Databases , 1998, DEXA.

[5]  Tim Bray,et al.  Measuring the Web , 1996, World Wide Web J..

[6]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[7]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[8]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[9]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[10]  Chi-Sheng Shih,et al.  Extracting classification knowledge of Internet documents with mining term associations: a semantic approach , 1998, SIGIR '98.

[11]  Sourav S. Bhowmick,et al.  Web warehousing: an algebra for web information , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[12]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[13]  Sourav S. Bhowmick,et al.  Data Visualization in a Web Warehouse , 1998, ER Workshops.

[14]  Laks V. S. Lakshmanan,et al.  A declarative language for querying and restructuring the Web , 1996, Proceedings RIDE '96. Sixth International Workshop on Research Issues in Data Engineering.

[15]  G. Moerkotte,et al.  RAW : a Relational Algebra for the Web , 1997 .

[16]  Jiawei Han,et al.  Intelligent Query Answering by Knowledge Discovery Techniques , 1996, IEEE Trans. Knowl. Data Eng..

[17]  Sourav S. Bhowmick,et al.  Information Coupling in Web Databases , 1998, ER.

[18]  Miao Liu,et al.  Structure-Based Queries over the World Wide Web , 1998, ER.

[19]  David Konopnicki,et al.  W3QS: A Query System for the World-Wide Web , 1995, VLDB.