Jigs and Lures: Associating Web Queries with Structured Entities

We propose methods for estimating the probability that an entity from an entity database is associated with a web search query. Association is modeled using a query entity click graph, blending general query click logs with vertical query click logs. Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. A large-scale empirical analysis of the smoothing techniques, over a 2-year click graph collected from a commercial search engine, shows significant reductions in modeling error. The association models are then applied to the task of recommending products to web queries, by annotating queries with products from a large catalog and then mining query-product associations through web search session analysis. Experimental analysis shows that our smoothing techniques improve coverage while keeping precision stable, and overall, that our top-performing model affects 9% of general web queries with 94% precision.

[1]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[2]  Francesco Bonchi,et al.  Query suggestions using query-flow graphs , 2009, WSCD '09.

[3]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[4]  ChengXiang Zhai,et al.  A general optimization framework for smoothing language models on graph structures , 2008, SIGIR '08.

[5]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[7]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[8]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[9]  Ariel Fuxman,et al.  Using the wisdom of the crowds for keyword generation , 2008, WWW.

[10]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[11]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[12]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[13]  Yehuda Koren,et al.  Modeling relationships at multiple scales to improve accuracy of large recommender systems , 2007, KDD '07.

[14]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[15]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[16]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[17]  Ricardo Baeza-Yates,et al.  Web Usage Mining in Search Engines , 2005 .

[18]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[19]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[20]  Eduard Hovy,et al.  Towards terascale knowledge acquisition , 2004, COLING 2004.

[21]  Wei-Ying Ma,et al.  Object-level Vertical Search , 2007, CIDR.

[22]  Tao Tao,et al.  Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[23]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[24]  Oren Kurland,et al.  Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[25]  Patrick Pantel,et al.  Identifying comparable entities on the web , 2009, CIKM.

[26]  Wei Yuan,et al.  Smoothing clickthrough data for web search ranking , 2009, SIGIR.

[27]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[28]  Benjamin Van Durme,et al.  Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs , 2008, ACL.

[29]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[30]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[31]  Sreenivas Gollapudi,et al.  Shopping for products you don't know you need , 2011, WSDM '11.

[32]  Olfa Nasraoui,et al.  Mining search engine query logs for query recommendation , 2006, WWW '06.