Joint Annotation of Search Queries

Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an important part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.

[1]  Fuchun Peng,et al.  Analyzing web text association to disambiguate abbreviation in queries , 2008, SIGIR '08.

[2]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[3]  James Allan,et al.  Syntactic Query Models for Restatement Retrieval , 2009, SPIRE.

[4]  Vitor R. Carvalho,et al.  Reducing long queries using query quality predictors , 2009, SIGIR.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Xiao Li,et al.  Understanding the Semantic Structure of Noun Phrase Queries , 2010, ACL.

[7]  Xiao Li,et al.  Semantic Tagging of Web Search Queries , 2009, ACL.

[8]  W. Bruce Croft,et al.  Analysis of long queries in a large scale search log , 2009, WSCD '09.

[9]  Christopher D. Manning,et al.  A Global Joint Model for Semantic Role Labeling , 2008, CL.

[10]  Matthias Hagen,et al.  The power of naive query segmentation , 2010, SIGIR '10.

[11]  Qin Iris Wang,et al.  Learning Noun Phrase Query Segmentation , 2007, EMNLP.

[12]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[13]  Hang Li,et al.  A unified and discriminative model for query refinement , 2008, SIGIR '08.

[14]  Rosie Jones,et al.  The Linguistic Structure of English Web-Search Queries , 2008, EMNLP.

[15]  Ying Li,et al.  Personal name classification in web queries , 2008, WSDM '08.

[16]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[17]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[18]  Rosie Jones,et al.  Query word deletion prediction , 2003, SIGIR.

[19]  Marius Pasca,et al.  Weakly-supervised discovery of named entities using web search queries , 2007, CIKM '07.

[20]  Matthew Lease Natural language processing for information retrieval: the time is ripe (again) , 2007, PIKM '07.

[21]  W. Bruce Croft,et al.  Two-stage query segmentation for information retrieval , 2009, SIGIR.

[22]  Fuchun Peng,et al.  Unsupervised query segmentation using generative language models and wikipedia , 2008, WWW.

[23]  Eric P. Xing,et al.  Stacking Dependency Parsers , 2008, EMNLP.

[24]  W. Bruce Croft,et al.  Structural annotation of search queries using pseudo-relevance feedback , 2010, CIKM.

[25]  Gilad Mishne,et al.  Improving Web Search Relevance with Semantic Features , 2009, EMNLP.

[26]  Matthias Hagen,et al.  Query segmentation revisited , 2011, WWW.

[27]  Christopher D. Manning,et al.  Joint Parsing and Named Entity Recognition , 2009, NAACL.

[28]  Iadh Ounis,et al.  Automatically Building a Stopword List for an Information Retrieval System , 2005, J. Digit. Inf. Manag..

[29]  James Allan,et al.  A Case For Shorter Queries, and Helping Users Create Them , 2007, NAACL.