Automatic identification of user goals in Web search

There has been recent interests in studying the "goal" behind a user's Web query, so that this goal can be used to improve the quality of a search engine's results. Previous studies have mainly focused on using manual query-log investigation to identify Web query goals. In this paper we study whether and how we can automate this goal-identification process. We first present our results from a human subject study that strongly indicate the feasibility of automatic query-goal identification. We then propose two types of features for the goal-identification task: user-click behavior and anchor-link distribution. Our experimental evaluation shows that by combining these features we can correctly identify the goals for 90% of the queries studied.

[1]  Brian D. Davison,et al.  Finding Relevant Website Queries , 2003, WWW.

[2]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  David Hawking,et al.  Overview of the TREC-2001 Web track , 2002 .

[5]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[6]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[7]  Kevin S. McCurley,et al.  Analysis of anchor text for web search , 2003, SIGIR.

[8]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[9]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[10]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[11]  Marti A. Hearst,et al.  Cha-Cha: A System for Organizing Intranet Search Results , 1999, USENIX Symposium on Internet Technologies and Systems.

[12]  Christopher Olston,et al.  ScentTrails: Integrating browsing and searching on the Web , 2003, TCHI.

[13]  Reiner Kraft,et al.  Mining anchor text for query refinement , 2004, WWW '04.

[14]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[15]  Eric R. Ziegel,et al.  Probability and Statistics for Engineering and the Sciences , 2004, Technometrics.

[16]  Oren Etzioni,et al.  Query routing for Web search engines: architecture and experiments , 2000, Comput. Networks.

[17]  Ricardo A. Baeza-Yates,et al.  The Intention Behind Web Queries , 2006, SPIRE.

[18]  Djoerd Hiemstra,et al.  Retrieving Web Pages Using Content, Links, URLs and Anchors , 2001, TREC.

[19]  G. Meek Mathematical statistics with applications , 1973 .

[20]  Christopher Olston,et al.  ScentTrails: Integrating Browsing and Searching on the World Wide Web , 2000 .

[21]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[22]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[23]  Christoph Hölscher How Internet Experts Search For Information On The Web , 1998, WebNet.

[24]  Bernard J. Jansen,et al.  A review of Web searching studies and a framework for future research , 2001, J. Assoc. Inf. Sci. Technol..

[25]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[26]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[27]  In-Ho Kang,et al.  Query type classification for web document retrieval , 2003, SIGIR.

[28]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[29]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[30]  Andrei Z. Broder,et al.  Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content , 1999, Comput. Networks.

[31]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[32]  Hector Garcia-Molina,et al.  Finding replicated Web collections , 2000, SIGMOD '00.

[33]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[34]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.