Semantic Models in Information Retrieval

Robertson and Spärck Jones pioneered experimental probabilistic models (Binary Independence Model) with both a typology generalizing the Boolean model, a frequency counting to calculate elementary weightings, and their combination into a global probabilistic estimation. However, this model did not consider indexing terms dependencies. An extension to mixture models (e.g., using a 2-Poisson law) made it possible to take into account these dependencies from a macroscopic point of view (BM25), as well as a shallow linguistic processing of co-references. New approaches (language models, for example “bag of words” models, probabilistic dependencies between requests and documents, and consequently Bayesian inference using Dirichlet prior conjugate) furnished new solutions for documents structuring (categorization) and for index smoothing. Presently, in these probabilistic models the main issues have been addressed from a formal point of view only. Thus, linguistic properties are neglected in the indexing language. The authors examine how a linguistic and semantic modeling can be integrated in indexing languages and set up a hybrid model that makes it possible to deal with different information retrieval problems in a unified way. DOI: 10.4018/978-1-4666-0330-1.ch007

[1]  Mark Steedman,et al.  Combinatory grammars and parasitic gaps , 1987 .

[2]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[3]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[4]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[5]  Stephen E. Robertson,et al.  Probabilistic models of indexing and searching , 1980, SIGIR '80.

[6]  Aravind K. Joshi,et al.  A Formalism for Dependency Grammar Based on Tree Adjoining Grammar , 2003 .

[7]  Warren R. Greiff,et al.  A theory of term weighting based on exploratory data analysis , 1998, SIGIR '98.

[8]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[9]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[10]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Djoerd Hiemstra,et al.  A probabilistic justification for using tf×idf term weighting in information retrieval , 2000, International Journal on Digital Libraries.

[13]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[14]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[15]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.