Learning a Replacement Model for Query Segmentation with Consistency in Search Logs

Query segmentation is to split a query into a sequence of non-overlapping segments that completely cover all tokens in the query. The majority of methods are unsupervised, however, they are usually not as accurate as supervised methods due to the lack of guidance from labeled data. In this paper, we propose a new paradigm of learning a replacement model with consistency(LRMC), to enable unsupervised training with guidance from search log data. In LRMC, we first assume the existence of a base segmenter (an implementation of any existing approach). Then, we utilize a key observation that queries with a similar intent tend to have consistent segmentations, to automatically collect a set of labeled data from the outputs of the base segmenter by leveraging search log data. Finally, we employ the auto-collected data to train a replacement model for selecting the correct segmentation of a new query from the outputs of the base segmenter. The results show LRMC can improve state-of-the-art methods by an F-Score of around 7%.

[1]  Matthias Hagen,et al.  Query segmentation revisited , 2011, WWW.

[2]  Eugene Agichtein,et al.  Unsupervised query segmentation using click data: preliminary results , 2010, WWW '10.

[3]  Jianfeng Gao,et al.  Exploring web scale language models for search query processing , 2010, WWW '10.

[4]  Qinghua Zheng,et al.  Mining query subtopics from search log data , 2012, SIGIR '12.

[5]  Matthias Hagen,et al.  The power of naive query segmentation , 2010, SIGIR '10.

[6]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[7]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[8]  Nan Sun,et al.  Query Segmentation Based on Eigenspace Similarity , 2009, ACL/IJCNLP.

[9]  Ting Liu,et al.  Unsupervised Query Segmentation Using Monolingual Word Alignment Method , 2011, Comput. Inf. Sci..

[10]  Rishiraj Saha Roy,et al.  Unsupervised query segmentation using only query logs , 2011, WWW.

[11]  Dan Roth,et al.  Part of Speech Tagging Using a Network of Linear Separators , 1998, ACL.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  ChengXiang Zhai,et al.  Unsupervised query segmentation using clickthrough for information retrieval , 2011, SIGIR '11.

[14]  Xiaohui Yu,et al.  Query segmentation using conditional random fields , 2009, KEYS '09.

[15]  Qin Iris Wang,et al.  Learning Noun Phrase Query Segmentation , 2007, EMNLP.

[16]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[17]  Hang Li,et al.  A unified and discriminative model for query refinement , 2008, SIGIR '08.

[18]  Peter Boros,et al.  Query Segmentation for Web Search , 2003, WWW.

[19]  Fuchun Peng,et al.  Unsupervised query segmentation using generative language models and wikipedia , 2008, WWW.

[20]  W. Bruce Croft,et al.  Two-stage query segmentation for information retrieval , 2009, SIGIR.