Heterogeneous graph-based intent learning with queries, web pages and Wikipedia concepts

The problem of learning user search intents has attracted intensive attention from both industry and academia. However, state-of-the-art intent learning algorithms suffer from different drawbacks when only using a single type of data source. For example, query text has difficulty in distinguishing ambiguous queries; search log is bias to the order of search results and users' noisy click behaviors. In this work, we for the first time leverage three types of objects, namely queries, web pages and Wikipedia concepts collaboratively for learning generic search intents and construct a heterogeneous graph to represent multiple types of relationships between them. A novel unsupervised method called heterogeneous graph-based soft-clustering is developed to derive an intent indicator for each object based on the constructed heterogeneous graph. With the proposed co-clustering method, one can enhance the quality of intent understanding by taking advantage of different types of data, which complement each other, and make the implicit intents easier to interpret with explicit knowledge from Wikipedia concepts. Experiments on two real-world datasets demonstrate the power of the proposed method where it achieves a 9.25% improvement in terms of NDCG on search ranking task and a 4.67% enhancement in terms of Rand index on object co-clustering task compared to the best state-of-the-art method.

[1]  Chun Chen,et al.  Document recommendation in social tagging services , 2010, WWW '10.

[2]  Chen Yu,et al.  The wisdom of advertisers: mining subgoals via query clustering , 2012, CIKM.

[3]  Chun Chen,et al.  An exploration of improving collaborative recommender systems via user-item subgroups , 2012, WWW.

[4]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[5]  Gang Wang,et al.  Understanding user's query intent with wikipedia , 2009, WWW '09.

[6]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[7]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[8]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[9]  Wei Wu,et al.  Learning query and document similarities from click-through bipartite graph with metadata , 2013, WSDM.

[10]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[11]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[12]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[13]  Xueqi Cheng,et al.  More than relevance: high utility query recommendation by mining users' search behaviors , 2012, CIKM '12.

[14]  Jiawei Han,et al.  Learning search tasks in queries and web pages via graph regularization , 2011, SIGIR '11.

[15]  Liu Rui,et al.  Fuzzy c-Means Clustering Algorithm , 2008 .

[16]  Lu Wang,et al.  Clustering query refinements by user intent , 2010, WWW '10.

[17]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.

[18]  Francesco Bonchi,et al.  From machu_picchu to "rafting the urubamba river": anticipating information needs via the entity-query graph , 2013, WSDM '13.

[19]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[20]  Jackie Chi Kit Cheung,et al.  Sequence clustering and labeling for unsupervised query intent discovery , 2012, WSDM '12.

[21]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[22]  Xiaoxin Yin,et al.  Building taxonomy of web search intents for name entity queries , 2010, WWW '10.

[23]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[24]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[25]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[26]  Deepayan Chakrabarti,et al.  Mining broad latent query aspects from search sessions , 2009, KDD.

[27]  Filippo Menczer,et al.  Behavior-driven clustering of queries into topics , 2011, CIKM '11.

[28]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[29]  Qinghua Zheng,et al.  Mining query subtopics from search log data , 2012, SIGIR '12.

[30]  Filip Radlinski,et al.  Inferring query intent from reformulations and clicks , 2010, WWW '10.

[31]  Enhong Chen,et al.  Context-aware query classification , 2009, SIGIR.