Learning latent semantic relations from clickthrough data for query suggestion

For a given query raised by a specific user, the Query Suggestion technique aims to recommend relevant queries which potentially suit the information needs of that user. Due to the complexity of the Web structure and the ambiguity of users' inputs, most of the suggestion algorithms suffer from the problem of poor recommendation accuracy. In this paper, aiming at providing semantically relevant queries for users, we develop a novel, effective and efficient two-level query suggestion model by mining clickthrough data, in the form of two bipartite graphs (user-query and query-URL bipartite graphs) extracted from the clickthrough data. Based on this, we first propose a joint matrix factorization method which utilizes two bipartite graphs to learn the low-rank query latent feature space, and then build a query similarity graph based on the features. After that, we design an online ranking algorithm to propagate similarities on the query similarity graph, and finally recommend latent semantically relevant queries to users. Experimental analysis on the clickthrough data of a commercial search engine shows the effectiveness and the efficiency of our method.

[1]  Georges Dupret,et al.  Automatic Query Recommendation using Click-Through Data , 2006, IFIP PPAI.

[2]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[3]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[4]  Paul-Alexandru Chirita,et al.  Personalized query expansion for the web , 2007, SIGIR.

[5]  Wei Gao,et al.  Cross-lingual query suggestion using query logs of different languages , 2007, SIGIR.

[6]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[7]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[8]  John D. Lafferty,et al.  Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..

[9]  Qiang Yang,et al.  Mining Web Query Hierarchies from Clickthrough Data , 2007, AAAI.

[10]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[11]  Qiang Yang,et al.  Web-page summarization using clickthrough data , 2005, SIGIR '05.

[12]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[13]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[14]  Filip Radlinski,et al.  Search Engines that Learn from Implicit Feedback , 2007, Computer.

[15]  Gerhard Weikum,et al.  Efficient and self-tuning incremental query expansion for top-k query processing , 2005, SIGIR '05.

[16]  ChengXiang Zhai,et al.  Learn from web search logs to organize search results , 2007, SIGIR.

[17]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[18]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[19]  Shenghuo Zhu,et al.  Learning multiple graphs for document recommendations , 2008, WWW.

[20]  Benjamin Van Durme,et al.  What You Seek Is What You Get: Extraction of Class Attributes from Query Logs , 2007, IJCAI.

[21]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[22]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[23]  Ron Weiss,et al.  Fast and effective query refinement , 1997, SIGIR '97.

[24]  Reiner Kraft,et al.  Mining anchor text for query refinement , 2004, WWW '04.

[25]  David F. Gleich,et al.  SVD Subspace Projections for Term Suggestion Ranking and Clustering , 2004 .

[26]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[27]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[28]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[29]  Michael R. Lyu,et al.  DiffusionRank: a possible penicillin for web spamming , 2007, SIGIR.

[30]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[31]  Yihong Gong,et al.  Combining content and link for classification using matrix factorization , 2007, SIGIR.

[32]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[33]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[34]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.