Using query logs and click data to create improved document descriptions

Logfiles of search engines are a promising resource for data mining, since they provide raw data associated to users and web documents. In this paper we focus on the latter aspect and explore how the information in logfiles could be used to improve document descriptions. A pilot experiment demonstrated that document descriptors extracted from the queries that are associated with documents by clicks provide useful semantic information about documents in addition to document descriptors extracted from the full text of the web pages.

[1]  Qiang Yang,et al.  Web-page summarization using clickthrough data , 2005, SIGIR '05.

[2]  Ludovic Denoyer,et al.  The Wikipedia XML Corpus , 2006, INEX.

[3]  Yen-Jen Oyang,et al.  Relevant term suggestion in interactive web search based on contextual information in query session logs , 2003, J. Assoc. Inf. Sci. Technol..

[4]  Eric Brill,et al.  Web Search Intent Induction via Automatic Query Reformulation , 2004, NAACL.

[5]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[6]  Ludovic Denoyer,et al.  The XML Wikipedia Corpus , 2006 .

[7]  Charles L. A. Clarke,et al.  Domain-Specific Synonym Expansion and Validation for Biomedical Information Retrieval (MultiText Experiments for TREC 2004) , 2004, TREC.

[8]  J. Mostert,et al.  Effects of Goal-Oriented Search Suggestions , 2008 .

[9]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[10]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[11]  Wei-Ying Ma,et al.  Optimizing web search using web click-through data , 2004, CIKM '04.

[12]  Filip Radlinski,et al.  Active exploration for learning rankings from clickthrough data , 2007, KDD '07.

[13]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[14]  Barry Smyth,et al.  From social bookmarking to social summarization: an experiment in community-based summary generation , 2007, IUI '07.

[15]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.