Using Query Performance Predictors to Improve Spoken Queries

The goal of query performance prediction is to estimate a query’s retrieval effectiveness without user feedback. Past research has investigated the usefulness of query performance predictors for the task of reducing verbose textual queries. The basic idea is to automatically find a shortened version of the original query that yields a better retrieval. To date, such techniques have been applied to TREC topic descriptions (as surrogates for verbose queries) and to long textual queries issued to a web search engine. In this paper, we build upon an existing query reduction approach that was applied to TREC topic descriptions and evaluate its generalizability to the new task of reducing spoken query transcriptions. Our results show that we are able to outperform the original spoken query by a small, but significant margin. Furthermore, we show that the terms that are omitted from better-performing sub-queries include extraneous terms not central to the query topic, disfluencies, and speech recognition errors.

[1]  Le Zhao,et al.  Term necessity prediction , 2010, CIKM.

[2]  W. Bruce Croft,et al.  Improving verbose queries using subset distribution , 2010, CIKM.

[3]  Vitor R. Carvalho,et al.  Reducing long queries using query quality predictors , 2009, SIGIR.

[4]  Fernando Diaz,et al.  Performance prediction using spatial autocorrelation , 2007, SIGIR.

[5]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, ICTIR.

[6]  Falk Scholer,et al.  Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence , 2008, ECIR.

[7]  Geoffrey Zweig,et al.  Leveraging multiple query logs to improve language models for spoken query recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  W. Bruce Croft,et al.  Modeling subset distributions for verbose queries , 2011, SIGIR.

[9]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[10]  Francoise Beaufays,et al.  “Your Word is my Command”: Google Search by Voice: A Case Study , 2010 .

[11]  Elad Yom-Tov,et al.  Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval , 2005, SIGIR '05.

[12]  Javed A. Aslam,et al.  Query Hardness Estimation Using Jensen-Shannon Divergence Among Multiple Scoring Functions , 2007, ECIR.

[13]  Daqing He,et al.  How do users respond to voice input errors?: lexical and phonetic query reformulation in voice search , 2013, SIGIR.

[14]  Claudia Hauff,et al.  Predicting the effectiveness of queries and retrieval systems , 2010, SIGF.

[15]  Charles L. A. Clarke,et al.  Reciprocal rank fusion outperforms condorcet and individual rank learning methods , 2009, SIGIR.

[16]  Fabio Crestani,et al.  Written versus spoken queries: A qualitative and quantitative comparative analysis , 2006 .

[17]  Gonzalo Navarro,et al.  Word-based self-indexes for natural language text , 2012, TOIS.

[18]  Milad Shokouhi,et al.  LambdaMerge: merging the results of query reformulations , 2011, WSDM '11.

[19]  Iadh Ounis,et al.  Inferring Query Performance Using Pre-retrieval Predictors , 2004, SPIRE.

[20]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[21]  Fuchun Peng,et al.  Search results based N-best hypothesis rescoring with maximum entropy classification , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[22]  W. Bruce Croft,et al.  Learning to rank query reformulations , 2010, SIGIR '10.

[23]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[24]  W. Bruce Croft,et al.  Ranking robustness: a novel framework to predict query performance , 2006, CIKM '06.

[25]  Bhuvana Ramabhadran,et al.  Improved Spoken Query Transcription Using Co-Occurrence Information , 2011, INTERSPEECH.

[26]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[27]  Niranjan Balasubramanian,et al.  Exploring reductions for long web queries , 2010, SIGIR.