Contextual Ranking of Keywords Using Click Data

The problem of automatically extracting the most interesting and relevant keyword phrases in a document has been studied extensively as it is crucial for a number of applications. These applications include contextual advertising, automatic text summarization, and user-centric entity detection systems. All these applications can potentially benefit from a successful solution as it enables computational efficiency (by decreasing the input size), noise reduction, or overall improved user satisfaction.In this paper, we study this problem and focus on improving the overall quality of user-centric entity detection systems. First, we review our concept extraction technique, which relies on search engine query logs. We then define a new feature space to represent the interestingness of concepts, and describe a new approach to estimate their relevancy for a given context. We utilize click through data obtained from a large scale user-centric entity detection system - Contextual Shortcuts - to train a model to rank the extracted concepts, and evaluate the resulting model extensively again based on their click through data. Our results show that the learned model outperforms the baseline model, which employs similar features but whose weights are tuned carefully based on empirical observations, and reduces the error rate from 30.22% to 18.66%.

[1]  Masatoshi Yoshikawa,et al.  Adaptive web search based on user profile constructed without any effort from users , 2004, WWW '04.

[2]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[3]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[4]  Qiang Yang,et al.  Web-page summarization using clickthrough data , 2005, SIGIR '05.

[5]  Monika Henzinger,et al.  Query-Free News Search , 2003, WWW '03.

[6]  Peter D. Turney Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data , 2002, ArXiv.

[7]  Jay Budzik,et al.  Supporting on-line resource discovery in the context of ongoing tasks with proactive software assistants , 2002, Int. J. Hum. Comput. Stud..

[8]  Barry Smyth,et al.  Further Experiments on Collaborative Ranking in Community-Based Web Search , 2004, Artificial Intelligence Review.

[9]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[10]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[11]  Marc Najork,et al.  Comparing the effectiveness of hits and salsa , 2007, CIKM '07.

[12]  Reiner Kraft,et al.  Leveraging context in user-centric entity detection systems , 2007, CIKM '07.

[13]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[14]  Peter Jackson,et al.  Natural Language Processing for Online Applications: Text Retrieval, Extraction & Categorization , 2002 .

[15]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[16]  Krishna Bharat SearchPad: explicit capture of search context to support Web search , 2000, Comput. Networks.

[17]  Lawrence Birnbaum,et al.  Information access in context , 2001, Knowl. Based Syst..

[18]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[19]  Paul P. Maglio,et al.  LiveInfo: Adapting Web Experience by Customization and Annotation , 2000, AH.

[20]  Peter Jackson,et al.  Natural language processing for online applications : text retrieval, extraction and categorization , 2002 .

[21]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[22]  Min Zhao,et al.  Ranking definitions with supervised learning methods , 2005, WWW '05.

[23]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[24]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[25]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[26]  Shyam Kapur,et al.  Unity: relevance feedback using user query logs , 2006, SIGIR '06.

[27]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[28]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[29]  Kristian J. Hammond,et al.  User interactions with everyday applications as context for just-in-time information access , 2000, IUI '00.

[30]  Ping Zhang,et al.  UNDERSTANDING CONSUMERS ATTITUDE TOWARD ADVERTISING , 2002 .

[31]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[32]  P. Chatterjee,et al.  Modeling the Clickstream: Implications for Web-Based Advertising Efforts , 2003 .

[33]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[34]  카퍼 샴,et al.  Systems and methods for generating concept units from search queries , 2004 .

[35]  Peter G. Anick Using terminological feedback for web search refinement: a log-based study , 2003, SIGIR.

[36]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[37]  Thad Starner,et al.  Remembrance Agent: A Continuously Running Automated Information Retrieval System , 1996, PAAM.

[38]  Kristian J. Hammond,et al.  Anticipating Information Needs: Everyday Applications as Interfaces to Internet Information Resources , 1998, WebNet.

[39]  Andrei Z. Broder,et al.  Just-in-time contextual advertising , 2007, CIKM '07.

[40]  Joshua Goodman,et al.  Finding advertising keywords on web pages , 2006, WWW '06.

[41]  Susan T. Dumais,et al.  Optimizing search by showing results in context , 2001, CHI.

[42]  Ee-Peng Lim,et al.  Measuring article quality in wikipedia: models and evaluation , 2007, CIKM '07.

[43]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[44]  Susan T. Dumais,et al.  Similarity Measures for Short Segments of Text , 2007, ECIR.

[45]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .