Web Page Summarization for Just-in-Time Contextual Advertising

Contextual advertising is a type of Web advertising, which, given the URL of a Web page, aims to embed into the page the most relevant textual ads available. For static pages that are displayed repeatedly, the matching of ads can be based on prior analysis of their entire content; however, often ads need to be matched to new or dynamically created pages that cannot be processed ahead of time. Analyzing the entire content of such pages on-the-fly entails prohibitive communication and latency costs. To solve the three-horned dilemma of either low relevance or high latency or high load, we propose to use text summarization techniques paired with external knowledge (exogenous to the page) to craft short page summaries in real time. Empirical evaluation proves that matching ads on the basis of such summaries does not sacrifice relevance, and is competitive with matching based on the entire page content. Specifically, we found that analyzing a carefully selected 6% fraction of the page text can sacrifice only 1%--3% in ad relevance. Furthermore, our summaries are fully compatible with the standard JavaScript mechanisms used for ad placement: they can be produced at ad-display time by simple additions to the usual script, and they only add 500--600 bytes to the usual request. We also compared our summarization approach, which is based on structural properties of the HTML content of the page, with a more principled one based on one of the standard text summarization tools (MEAD), and found their performance to be comparable.

[1]  Dianne P. O'Leary,et al.  Text summarization via hidden Markov models , 2001, SIGIR '01.

[2]  Krishna Bharat,et al.  The Term Vector Database: fast access to indexing terms for Web pages , 2000, Comput. Networks.

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  P. Chatterjee,et al.  Modeling the Clickstream: Implications for Web-Based Advertising Efforts , 2003 .

[5]  Chun Chen,et al.  Tag-oriented document summarization , 2009, WWW '09.

[6]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[7]  Filip Radlinski,et al.  Optimizing relevance and revenue in ad search: a query substitution approach , 2008, SIGIR '08.

[8]  Jugal K. Kalita,et al.  Summarization as feature selection for text categorization , 2001, CIKM '01.

[9]  Evgeniy Gabrilovich,et al.  The anatomy of an ad: structured indexing and retrieval for sponsored search , 2010, WWW '10.

[10]  David Hawking,et al.  Overview of TREC-7 Very Large Collection Track , 1997, TREC.

[11]  Wai Lam,et al.  Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[12]  Andrei Z. Broder,et al.  A semantic approach to contextual advertising , 2007, SIGIR.

[13]  Andrei Z. Broder,et al.  Robust classification of rare queries using web knowledge , 2007, SIGIR.

[14]  Adam Jatowt Web page summarization using dynamic content , 2004, WWW Alt. '04.

[15]  Inderjeet Mani,et al.  Multi-Document Summarization by Graph Search and Matching , 1997, AAAI/IAAI.

[16]  Evgeniy Gabrilovich,et al.  Feature Generation for Text Categorization Using World Knowledge , 2005, IJCAI.

[17]  Deepayan Chakrabarti,et al.  Contextual advertising by combining relevance with click feedback , 2008, WWW.

[18]  Joshua Goodman,et al.  Finding advertising keywords on web pages , 2006, WWW '06.

[19]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[20]  Seiji Miike,et al.  Abstract Generation Based on Rhetorical Structure Extraction , 1994, COLING.

[21]  Wei-Ying Ma,et al.  Web-page classification through summarization , 2004, SIGIR '04.

[22]  Dragomir R. Radev,et al.  Hierarchical summarization for delivering information to mobile devices , 2008, Inf. Process. Manag..

[23]  Berthier A. Ribeiro-Neto,et al.  Impedance coupling in content-targeted advertising , 2005, SIGIR '05.

[24]  Chin-Yew Lin Training a selection function for extraction , 1999, CIKM '99.

[25]  Lucy Vanderwende,et al.  Enhancing Single-Document Summarization by Combining RankNet and Third-Party Sources , 2007, EMNLP.

[26]  Joe Marini,et al.  Document Object Model , 2002, Encyclopedia of GIS.

[27]  Barry Smyth,et al.  Social summarization in collaborative web search , 2010, Inf. Process. Manag..

[28]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[29]  Ping Zhang,et al.  UNDERSTANDING CONSUMERS ATTITUDE TOWARD ADVERTISING , 2002 .

[30]  Sang-goo Lee,et al.  Web content summarization using social bookmarks: a new approach for social summarization , 2008, WIDM '08.

[31]  Daniel Marcu Improving summarization through rhetorical parsing tuning , 1998, VLC@COLING/ACL.

[32]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[33]  Andrei Z. Broder,et al.  Just-in-time contextual advertising , 2007, CIKM '07.

[34]  Shelley Powers Learning JavaScript, 2nd Edition , 2008 .

[35]  Ee-Peng Lim,et al.  Comments-oriented document summarization: understanding documents with readers' feedback , 2008, SIGIR '08.

[36]  Andreas Paepcke,et al.  Efficient web browsing on handheld devices using page and form summarization , 2002, TOIS.

[37]  Andrei Z. Broder,et al.  Search advertising using web relevance feedback , 2008, CIKM '08.

[38]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[39]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[40]  Andrei Z. Broder,et al.  Online expansion of rare queries for sponsored search , 2009, WWW '09.

[41]  Fuad Rahman,et al.  Web page summarization for handheld devices: a natural language approach , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[42]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[43]  Naoki Abe,et al.  Unintrusive Customization Techniques for Web Advertising , 1999, Comput. Networks.

[44]  Daniel C. Fain,et al.  Sponsored search: A brief history , 2006 .

[45]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[46]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[47]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[48]  M. Rey Improving summarization through rhetorical parsing tuning , 1998 .

[49]  Miles Osborne,et al.  Using maximum entropy for sentence extraction , 2002, ACL 2002.

[50]  Sun Park,et al.  Automatic generic document summarization based on non-negative matrix factorization , 2009, Inf. Process. Manag..

[51]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[52]  Dipanjan Das Andr,et al.  A Survey on Automatic Text Summarization , 2007 .

[53]  Weiguo Fan,et al.  Learning to advertise , 2006, SIGIR.

[54]  Qiang Yang,et al.  Web-page summarization using clickthrough data , 2005, SIGIR '05.

[55]  Stephen C. Gates,et al.  Taxonomies by the numbers: building high-performance taxonomies , 2005, CIKM '05.

[56]  AnagnostopoulosAris,et al.  Web Page Summarization for Just-in-Time Contextual Advertising , 2011 .

[57]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[58]  George Karypis,et al.  Centroid-Based Document Classification: Analysis and Experimental Results , 2000, PKDD.