Learning query intent from regularized click graphs

This work presents the use of click graphs in improving query intent classifiers, which are critical if vertical search and general-purpose search services are to be offered in a unified user interface. Previous works on query classification have primarily focused on improving feature representation of queries, e.g., by augmenting queries with search engine results. In this work, we investigate a completely orthogonal approach --- instead of enriching feature representation, we aim at drastically increasing the amounts of training data by semi-supervised learning with click graphs. Specifically, we infer class memberships of unlabeled queries from those of labeled ones according to their proximities in a click graph. Moreover, we regularize the learning with click graphs by content-based classification to avoid propagating erroneous labels. We demonstrate the effectiveness of our algorithms in two different applications, product intent and job intent classification. In both cases, we expand the training data with automatically labeled queries by over two orders of magnitude, leading to significant improvements in classification performance. An additional finding is that with a large amount of training data obtained in this fashion, classifiers using only query words/phrases as features can work remarkably well.

[1]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[2]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[3]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[4]  Min-Yen Kan,et al.  Functional Faceted Web Query Analysis , 2007 .

[5]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[6]  Ophir Frieder,et al.  Varying approaches to topical web query classification , 2007, SIGIR.

[7]  Qiang Yang,et al.  Building bridges for web query classification , 2006, SIGIR.

[8]  Andrei Z. Broder,et al.  Robust classification of rare queries using web knowledge , 2007, SIGIR.

[9]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[10]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[11]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[12]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[13]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[14]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[15]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[16]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[17]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[18]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[19]  Ophir Frieder,et al.  Improving automatic query classification via semi-supervised learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[20]  Wei-Ying Ma,et al.  IRC: an iterative reinforcement categorization algorithm for interrelated Web objects , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[21]  Zhenyu Liu,et al.  Automatic identification of user goals in Web search , 2005, WWW '05.