Towards a Data Warehouse Contextualized with Web Opinions

In this work we consider the web forums where the users give their opinion about the products or services that some organizations offer. The OLAP tools of the traditional data warehouse systems, mainly designed to analyse structured data, cannot be directly applied to take advantage of these on-line text documents. This paper describes the objectives of our new project on so-called contextualized warehouses to exploit these opinion documents. In the analysis cubes of a contextualized warehouse, each fact is linked to a document list. These documents provide information related to the fact (i.e., they describe its context). The opinions in the web posts are typically expressed as small text fragments that sometimes include incomplete sentences. In this paper, we propose to extend the contextualized warehouse infrastructure with new opinion retrieval techniques conceived to classify and search for opinions in document collections with these characteristics. Since the project is still in its early stages, the paper mainly studies the requirements, reviews the main technologies that will be involved in the development of the project and discusses our current/future work.

[1]  Rafael Berlanga Llavori,et al.  CRISOL: An Approach for Automatically Populating Semantic Web from Unstructured Text Collections , 2004, DEXA.

[2]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[3]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[4]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[5]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[6]  Robert L. Grossman,et al.  Mining data records in Web pages , 2003, KDD '03.

[7]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[8]  Rafael Berlanga Llavori,et al.  Extracting Temporal References to Assign Document Event-Time Periods , 2001, DEXA.

[9]  Rafael Berlanga Llavori,et al.  A Document Model Based on Relevance Modeling Techniques for Semi-structured Information , 2004, DEXA.

[10]  The Maria , 1916, American Journal of International Law.

[11]  Torben Bach Pedersen,et al.  R-Cubes: OLAP Cubes Contextualized with Documents , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Torben Bach Pedersen,et al.  Contextualizing data warehouses with documents , 2008, Decis. Support Syst..

[13]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[14]  Koji Eguchi,et al.  Sentiment Retrieval using Generative Models , 2006, EMNLP.

[15]  E. F. Codd,et al.  Providing OLAP to User-Analysts: An IT Mandate , 1998 .

[16]  Akhil Kumar,et al.  A dynamic warehouse for XML Data of the Web. , 2001 .