Query structuring and expansion with two-stage term dependence for Japanese web retrieval

In this paper, we propose a new term dependence model for information retrieval, which is based on a theoretical framework using Markov random fields. We assume two types of dependencies of terms given in a query: (i) long-range dependencies that may appear for instance within a passage or a sentence in a target document, and (ii) short-range dependencies that may appear for instance within a compound word in a target document. Based on this assumption, our two-stage term dependence model captures both long-range and short-range term dependencies differently, when more than one compound word appear in a query. We also investigate how query structuring with term dependence can improve the performance of query expansion using a relevance model. The relevance model is constructed using the retrieval results of the structured query with term dependence to expand the query. We show that our term dependence model works well, particularly when using query structuring with compound words, through experiments using a 100-gigabyte test collection of web documents mostly written in Japanese. We also show that the performance of the relevance model can be significantly improved by using the structured query with our term dependence model.

[1]  W. Bruce Croft,et al.  Indri at TREC 2004: Terabyte Track , 2004, TREC.

[2]  W. Bruce Croft,et al.  Optimization strategies for complex queries , 2005, SIGIR '05.

[3]  Charles L. A. Clarke,et al.  Overview of the TREC 2004 Terabyte Track , 2004, TREC.

[4]  Charles L. A. Clarke,et al.  Overview of the TREC 2004 Terabyte Track | NIST , 2005 .

[5]  Fredric C. Gey,et al.  Experiments on Cross-language and Patent Retrieval at NTCIR-3 Workshop , 2002, NTCIR.

[6]  David Hawking,et al.  Overview of the TREC 2003 Web Track , 2003, TREC.

[7]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[8]  Noriko Kando,et al.  Overview of the Web Retrieval Task at the Third NTCIR Workshop , 2003, NTCIR.

[9]  Isabelle Moulinier,et al.  Thomson Legal and Regulatory at NTCIR-3: Japanese, Chinese and English Retrieval Experiments , 2002, NTCIR.

[10]  Keizo Oyama,et al.  Overview of the Informational Retrieval Task at NTCIR-4 WEB , 2004, NTCIR.

[11]  Toru Matsuda,et al.  Overlapping statistical word indexing: a new indexing method for Japanese text , 1997, SIGIR '97.

[12]  Sumio Fujita,et al.  Notes on Phrasal Indexing: JSCB Evaluation Experiments at NTCIR AD HOC , 1999, NTCIR.

[13]  Kyo Kageura,et al.  Phrase processing methods for Japanese text retrieval , 1998, SIGF.

[14]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[15]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[16]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[17]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[18]  Gareth J. F. Jones,et al.  Experiments in Japanese text retrieval and routing using the NEAT system , 1998, SIGIR '98.

[19]  Koji Eguchi,et al.  NTCIR-5 Query Expansion Experiments using Term Dependence Models , 2005, NTCIR.

[20]  W. Bruce Croft,et al.  Efficient processing of complex features for information retrieval , 2008 .

[21]  Gilad Mishne,et al.  Boosting Web Retrieval through Query Operations , 2005, BNAIC.

[22]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[23]  Claire Cardie,et al.  An Analysis of Statistical and Syntactic Phrases , 1997, RIAO.

[24]  W. Bruce Croft,et al.  A comparison of indexing techniques for Japanese text retrieval , 1993, SIGIR.

[25]  W. Bruce Croft,et al.  The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.