Sequence clustering and labeling for unsupervised query intent discovery

One popular form of semantic search observed in several modern search engines is to recognize query patterns that trigger instant answers or domain-specific search, producing semantically enriched search results. This often requires understanding the query intent in addition to the meaning of the query terms in order to access structured data sources. A major challenge in intent understanding is to construct a domain-dependent schema and to annotate search queries based on such a schema, a process that to date has required much manual annotation effort. We present an unsupervised method for clustering queries with similar intent and for producing a pattern consisting of a sequence of semantic concepts and/or lexical items for each intent. Furthermore, we leverage the discovered intent patterns to automatically annotate a large number of queries beyond those used in clustering. We evaluated our method on 10 selected domains, discovering over 1400 intent patterns and automatically annotating 125K (and potentially many more) queries. We found that over 90% of patterns and 80% of instance annotations tested are judged to be correct by a majority of annotators.

[1]  Patrick Pantel,et al.  Ontologizing Semantic Relations , 2006, ACL.

[2]  Jeffrey P. Bigham,et al.  Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge , 2006, AAAI.

[3]  Patrick Pantel,et al.  Concept Discovery from Text , 2002, COLING.

[4]  Lu Wang,et al.  Clustering query refinements by user intent , 2010, WWW '10.

[5]  Valerie Guralnik,et al.  A scalable algorithm for clustering sequential data , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[6]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[7]  Aristides Gionis,et al.  Improving recommendation for long-tail queries via templates , 2011, WWW.

[8]  Patrick Pantel,et al.  Entity Extraction via Ensemble Semantics , 2009, EMNLP.

[9]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[10]  Panayiotis Tsaparas,et al.  Structured annotations of web queries , 2010, SIGMOD Conference.

[11]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[12]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[13]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.

[14]  Shui-Lung Chuang,et al.  Enriching Web taxonomies through subject categorization of query terms from search engine logs , 2003, Decis. Support Syst..

[15]  Pádraig Cunningham,et al.  Ontology Discovery for the Semantic Web Using Hierarchical Clustering , 2002 .

[16]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[17]  Xiao Li,et al.  Understanding the Semantic Structure of Noun Phrase Queries , 2010, ACL.

[18]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[19]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[20]  Justus J. Randolph Free-Marginal Multirater Kappa (multirater K[free]): An Alternative to Fleiss' Fixed-Marginal Multirater Kappa. , 2005 .

[21]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[22]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[23]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[24]  Kevin Chen-Chuan Chang,et al.  Towards rich query interpretation: walking back and forth for mining query templates , 2010, WWW '10.

[25]  Xiao Li,et al.  Extracting structured information from user queries with semi-supervised conditional random fields , 2009, SIGIR.

[26]  Ricardo A. Baeza-Yates,et al.  Improving search engines by query clustering , 2007, J. Assoc. Inf. Sci. Technol..

[27]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[28]  Dong Yu,et al.  Sequential Labeling Using Deep-Structured Conditional Random Fields , 2010, IEEE Journal of Selected Topics in Signal Processing.

[29]  Nacéra Bennacer,et al.  Ontology Discovery from Web Pages : Application to Tourism , 2004 .

[30]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.