论文信息 - Improving the Reliability of Query Expansion for User-Generated Speech Retrieval Using Query Performance Prediction

Improving the Reliability of Query Expansion for User-Generated Speech Retrieval Using Query Performance Prediction

The high-variability in content and structure combined with transcription errors makes effective information retrieval (IR) from archives of spoken user generated content (UGC) very challenging. Previous research has shown that using passage-level evidence for query expansion (QE) in IR can be beneficial for improving search effectiveness. Our investigation of passage-level QE for a large Internet collection of UGC demonstrates that while it is effective for this task, the informal and variable nature of UGC means that different queries respond better to alternative types of passages or in some cases use of whole documents rather than extracted passages. We investigate the use of Query Performance Prediction (QPP) to select the appropriate passage type for each query, including the introduction of a novel Weighted Expansion Gain (WEG) as a QPP new method. Our experimental investigation using an extended adhoc search task based on the MediaEval 2012 Search task shows the superiority of using our proposed adaptive QE approach for retrieval. The effectiveness of this method is shown in a per-query evaluation of utilising passage and full document evidence for QE within the inconsistent, uncertain settings of UGC retrieval.

Andy Way | Gareth J. F. Jones | Ahmad Khwileh

[1] Zhenmei Gu,et al. Comparison of using passages and documents for blind relevance feedback in information retrieval , 2004, SIGIR '04.

[2] Martha Larson,et al. Comparing retrieval effectiveness of alternative content segmentation methods for Internet video search , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[3] Martha Larson,et al. Blip10000: a social video dataset containing SPUG content for tagging and retrieval , 2013, MMSys.

[4] W. Bruce Croft,et al. Query performance prediction in web search environments , 2007, SIGIR.

[5] Gareth J. F. Jones,et al. Investigating segment-based query expansion for user-generated spoken content retrieval , 2016, 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI).

[6] Freddy Y. Y. Choi. Advances in domain independent linear text segmentation , 2000, ANLP.

[7] C. J. van Rijsbergen,et al. Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[8] James Allan,et al. A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[9] Claudio Carpineto,et al. A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[10] Oren Kurland,et al. Predicting Query Performance by Query-Drift Estimation , 2009, ICTIR.

[11] Fernando Llopis,et al. The University of Alicante at CL-SR Track , 2005, CLEF.