A relevance model for a data warehouse contextualized with documents

This paper presents a relevance model to rank the facts of a data warehouse that are described in a set of documents retrieved with an information retrieval (IR) query. The model is based in language modeling and relevance modeling techniques. We estimate the relevance of the facts by the probability of finding their dimensions values and the query keywords in the documents that are relevant to the query. The model is the core of the so-called contextualized warehouse, which is a new kind of decision support system that combines structured data sources and document collections. The paper evaluates the relevance model with the Wall Street Journal (WSJ) TREC test subcollection and a self-constructed fact database.

[1]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[2]  Rafael Berlanga Llavori,et al.  CRISOL: An Approach for Automatically Populating Semantic Web from Unstructured Text Collections , 2004, DEXA.

[3]  James Allan,et al.  Relevance models for topic detection and tracking , 2002 .

[4]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[5]  Torben Bach Pedersen,et al.  Multidimensional Databases , 2005, Encyclopedia of Cryptography and Security.

[6]  E. F. Codd,et al.  Providing OLAP to User-Analysts: An IT Mandate , 1998 .

[7]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[8]  Koji Eguchi,et al.  Sentiment Retrieval using Generative Models , 2006, EMNLP.

[9]  Leif Azzopardi,et al.  An analysis on document length retrieval trends in language modeling smoothing , 2008, Information Retrieval.

[10]  The Maria , 1916, American Journal of International Law.

[11]  Rafael Berlanga Llavori,et al.  Extracting Temporal References to Assign Document Event-Time Periods , 2001, DEXA.

[12]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[13]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[14]  Torben Bach Pedersen,et al.  Contextualizing data warehouses with documents , 2008, Decis. Support Syst..

[15]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[16]  R. Manmatha,et al.  Statistical models for automatic video annotation and retrieval , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[18]  S. Robertson The probability ranking principle in IR , 1997 .

[19]  Torben Bach Pedersen,et al.  R-Cubes: OLAP Cubes Contextualized with Documents , 2007, 2007 IEEE 23rd International Conference on Data Engineering.