Query Expansion with the Minimum User Feedback by Transductive Learning

Query expansion is a technique of information retrieval to select new query terms which improve search performance. Although good terms can be extracted from documents whose relevancy has already been known, it is difficult to get enough such feedback from users in practical situations. In this paper we propose a query expansion method which performs well even if a user only notifies relevancy of documents until just a relevant one is found. In order to tackle this specific condition, we introduce two refinements to a well-known query expansion method. One is the application of transductive learning to increase the amount of latent relevant documents. The other is the introduction of a modified parameter estimation method which laps the predictions of multiple learning trials in order to differentiate the importance of candidate terms for expansion. Experimental results show that our method outperforms traditional methods when an initial search fails.

[1]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[2]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[3]  Donna K. Harman,et al.  Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[4]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[5]  Jingrui He,et al.  Manifold-ranking based image retrieval , 2004, MULTIMEDIA '04.

[6]  Gareth J. F. Jones,et al.  Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[7]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[8]  Tetsuya Sakai,et al.  Flexible pseudo-relevance feedback via selective sampling , 2005, TALIP.

[9]  IJsbrand Jan Aalbersberg,et al.  Incremental relevance feedback , 1992, SIGIR '92.

[10]  Susan T. Dumais,et al.  SIGIR 2003 workshop report: implicit measures of user interests and preferences , 2003, SIGF.

[11]  Stephen E. Robertson,et al.  On relevance weights with little relevance information , 1997, SIGIR '97.

[12]  John D. Lafferty,et al.  Semi-supervised learning using randomized mincuts , 2004, ICML.

[13]  Stephen E. Robertson,et al.  Overview of the Okapi projects , 1997, J. Documentation.

[14]  Seiji Yamada,et al.  Query Expansion with the Minimum User Feedback by Transductive Learning , 2005, EMNLP 2005.

[15]  Yasuhiko Kitamura,et al.  Keyword Spices: A New Method for Building Domain-Specific Web Search Engines , 2001, IJCAI.

[16]  Michael Collins,et al.  AT&T at TREC-8 , 1999, TREC.

[17]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[18]  C. Lee Giles,et al.  Extracting query modifications from nonlinear SVMs , 2002, WWW '02.

[19]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[20]  Robert D. Macredie,et al.  Cognitive styles and hypermedia navigation: Development of a learning model , 2002, J. Assoc. Inf. Sci. Technol..

[21]  Ian Ruthven,et al.  Re-examining the potential effectiveness of interactive query expansion , 2003, SIGIR.

[22]  Seiji Yamada,et al.  Non-relevance feedback document retrieval , 2004, IEEE Conference on Cybernetics and Intelligent Systems, 2004..

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[25]  Seiji Yamada,et al.  Query Expansion with the Minimum Relevance Judgments , 2005, AIRS.

[26]  Gerard Salton,et al.  The smart document retrieval project , 1991, SIGIR '91.

[27]  Wei-Ying Ma,et al.  Improving pseudo-relevance feedback in web information retrieval using web page segmentation , 2003, WWW '03.

[28]  Makoto Iwayama,et al.  Relevance feedback with a small number of relevance judgements: incremental relevance feedback vs. document clustering , 2000, SIGIR '00.