Learning search tasks in queries and web pages via graph regularization

As the Internet grows explosively, search engines play a more and more important role for users in effectively accessing online information. Recently, it has been recognized that a query is often triggered by a search task that the user wants to accomplish. Similarly, many web pages are specifically designed to help accomplish a certain task. Therefore, learning hidden tasks behind queries and web pages can help search engines return the most useful web pages to users by task matching. For instance, the search task that triggers query "thinkpad T410 broken" is to maintain a computer, and it is desirable for a search engine to return the Lenovo troubleshooting page on the top of the list. However, existing search engine technologies mainly focus on topic detection or relevance ranking, which are not able to predict the task that triggers a query and the task a web page can accomplish. In this paper, we propose to simultaneously classify queries and web pages into the popular search tasks by exploiting their content together with click-through logs. Specifically, we construct a taskoriented heterogeneous graph among queries and web pages. Each pair of objects in the graph are linked together as long as they potentially share similar search tasks. A novel graph-based regularization algorithm is designed for search task prediction by leveraging the graph. Extensive experiments in real search log data demonstrate the effectiveness of our method over state-of-the-art classifiers, and the search performance can be significantly improved by using the task prediction results as additional information.

[1]  Ying Li,et al.  Product query classification , 2009, CIKM.

[2]  Rui Li,et al.  Exploring social tagging graph for web object classification , 2009, KDD.

[3]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[4]  Mikhail Belkin,et al.  Manifold Regularization : A Geometric Framework for Learning from Examples , 2004 .

[5]  Gang Wang,et al.  Understanding user's query intent with wikipedia , 2009, WWW '09.

[6]  Quanquan Gu,et al.  Transductive Classification via Dual Regularization , 2009, ECML/PKDD.

[7]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[8]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[9]  F. Chung Spectral Graph Theory, Regional Conference Series in Math. , 1997 .

[10]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[11]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[12]  Zhenyu Liu,et al.  Automatic identification of user goals in Web search , 2005, WWW '05.

[13]  Dunja Mladenic,et al.  Turning Yahoo to Automatic Web-Page Classifier , 1998, European Conference on Artificial Intelligence.

[14]  Qiang Yang,et al.  A comparison of implicit and explicit links for web page classification , 2006, WWW '06.

[15]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[16]  Amanda Spink,et al.  Determining the informational, navigational, and transactional intent of Web queries , 2008, Inf. Process. Manag..

[17]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[18]  Marius Pasca,et al.  Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds , 2007, WWW '07.

[19]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[20]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[21]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[22]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[23]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[24]  Wei-Ying Ma,et al.  IRC: an iterative reinforcement categorization algorithm for interrelated Web objects , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[25]  Xiaoxin Yin,et al.  Building taxonomy of web search intents for name entity queries , 2010, WWW '10.

[26]  Fan Chung,et al.  Spectral Graph Theory , 1996 .