论文信息 - Handling verbose queries for spoken document retrieval

Handling verbose queries for spoken document retrieval

Query-by-example information retrieval provides users a flexible but efficient way to accurately describe their information needs. The query exemplars are usually long and in the form of either a partial or even a full document. However, they may contain extraneous terms that would have potential negative impacts on the retrieval performance. In order to alleviate those negative impacts, we propose a novel term-based query reduction mechanism so as to improve the informativeness of verbose query exemplars. We also explore the notion of term discrimination power to select a salient subset of query terms automatically. Experiments on the TDT Chinese collection show that the proposed approach is indeed effective and promising.

[1] Ea-Ee Jan,et al. Improving the informativeness of verbose queries using summarization techniques for spoken document retrieval , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[2] Jianqiang Wang,et al. Mandarin-English Information (MEI): investigating translingual speech retrieval , 2004, Comput. Speech Lang..

[3] Dragomir R. Radev,et al. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[4] Vitor R. Carvalho,et al. Reducing long queries using query quality predictors , 2009, SIGIR.

[5] Treebank Penn,et al. Linguistic Data Consortium , 1999 .

[6] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7] W. Bruce Croft,et al. Discovering key concepts in verbose queries , 2008, SIGIR '08.

[8] Donna K. Harman,et al. Overview of the Fourth Text REtrieval Conference (TREC-4) , 1995, TREC.

[9] James Allan,et al. Regression Rank: Learning to Meet the Opportunity of Descriptive Queries , 2009, ECIR.

[10] Hwee Tou Ng,et al. Statistical lattice-based spoken document retrieval , 2010, TOIS.

[11] Lin-Shan Lee,et al. A discriminative HMM/N-gram-based retrieval approach for mandarin spoken documents , 2004, TALIP.

[12] Berlin Chen. Latent topic modelling of word co-occurence information for spoken document retrieval , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.