论文信息 - Estimation of Query Model from Parsimonious Translation Model

Estimation of Query Model from Parsimonious Translation Model

The KL divergence framework, the extended language modeling approach, have a critical problem with estimation of query model, which is the probabilistic model that encodes user's information need. However, at initial retrieval, it is difficult to expand query model using co-occurrence, because the two-dimensional matrix information such as term co-occurrence must be constructed in offline. Especially in large collection, constructing such large matrix of term co-occurrences prohibitively increases time and space complexity. This paper proposes an effective method to construct co-occurrence statistics by employing parsimonious translation model. Parsimonious translation model is a compact version of translation model, and it contains very small number of parameters that includes non-zero probabilities. Parsimonious translation model enables us to enormously reduce the number of remaining terms in document so that co-occurrence statistics can be calculated in tractable time. In experimentations, the results show that query model derived from parsimonious translation model significantly improves baseline language modeling performance.

[1] W. Bruce Croft,et al. A general language model for information retrieval , 1999, CIKM '99.

[2] John D. Lafferty,et al. Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[3] Nancy Ide,et al. Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[4] Djoerd Hiemstra,et al. Language models and probability of relevance , 2001 .

[5] W. Bruce Croft,et al. Cluster-based retrieval using language models , 2004, SIGIR '04.

[6] Djoerd Hiemstra,et al. Using language models for information retrieval , 2001 .

[7] Rohini K. Srihari,et al. Biterm language models for document retrieval , 2002, SIGIR '02.

[8] Djoerd Hiemstra,et al. Bayesian extension to the language model for ad hoc information retrieval , 2003, SIGIR.

[9] Douglas W. Oard,et al. Structured translation for cross-language information retrieval , 2000, SIGIR '00.

[10] W. Bruce Croft,et al. A general language model for information retrieval (poster abstract) , 1999, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[11] James Allan,et al. Capturing term dependencies using a language model based on sentence trees , 2002, CIKM '02.

[12] J. H. Lee,et al. n-Gram-based indexing for Korean text retrieval , 1999, Inf. Process. Manag..

[13] John Lafferty,et al. Information retrieval as statistical translation , 1999, SIGIR 1999.

[14] W. Bruce Croft,et al. Cross-lingual relevance models , 2002, SIGIR '02.

[15] Djoerd Hiemstra,et al. Parsimonious language models for information retrieval , 2004, SIGIR '04.

[16] Thomas Hofmann,et al. Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[17] Djoerd Hiemstra,et al. Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term , 2002, SIGIR '02.

[18] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19] Richard M. Schwartz,et al. A hidden Markov model information retrieval system , 1999, SIGIR '99.