Topic Pages: An Alternative to the Ten Blue Links

We investigate the automatic generation of topic pages as an alternative to the current Web search paradigm. Topic pages explicitly aggregate information across documents, filter redundancy, and promote diversity of topical aspects. We propose a novel framework for building rich topical aspect models and selecting diverse information from the Web. In particular, we use Web search logs to build aspect models with various degrees of specificity, and then employ these aspect models as input to a sentence selection method that identifies relevant and non-redundant sentences from the Web. Automatic and manual evaluations on biographical topics show that topic pages built by our system compare favorably to regular Web search results and to MDS-style summaries of the Web results on all metrics employed.

[1]  Elena Filatova,et al.  Tell Me What You Do and I'll Tell You What You Are: Learning Occupation-Related Activities for Biographies , 2005, HLT/EMNLP.

[2]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[3]  Inderjeet Mani,et al.  Producing Biographical Summaries: Combining Linguistic Knowledge with Corpus Statistics , 2001, ACL.

[4]  Julia Hirschberg,et al.  An Unsupervised Approach to Biography Production Using Wikipedia , 2008, ACL.

[5]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[6]  Silviu Cucerzan,et al.  Re-ranking search results using query logs , 2006, CIKM '06.

[7]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[8]  Ani Nenkova,et al.  Can you summarize this? Identifying correlates of input difficulty for generic multi-document summarization , 2008, ACL 2008.

[9]  Eric Brill,et al.  Web Search Intent Induction via Automatic Query Reformulation , 2004, NAACL.

[10]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[11]  Regina Barzilay,et al.  Automatically Generating Wikipedia Articles: A Structure-Aware Approach , 2009, ACL.

[12]  Pu-Jen Cheng,et al.  Query taxonomy generation for web search , 2006, CIKM '06.

[13]  Ani Nenkova,et al.  A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[14]  David Yarowsky,et al.  Structural, Transitive and Latent Models for Biographic Fact Extraction , 2009, EACL.

[15]  Andrew Hickl,et al.  LCC's GISTexter at DUC 2006: Multi-Strategy Multi-Document Summarization , 2006 .

[16]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[17]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[18]  Vivi Nastase,et al.  Topic-Driven Multi-Document Summarization with Encyclopedic Knowledge and Spreading Activation , 2008, EMNLP.

[19]  ChengXiang Zhai,et al.  Learn from web search logs to organize search results , 2007, SIGIR.

[20]  David E. Millard,et al.  Automatic Ontology-Based Knowledge Extraction from Web Documents , 2003, IEEE Intell. Syst..

[21]  Charles L. A. Clarke,et al.  The influence of caption features on clickthrough patterns in web search , 2007, SIGIR.

[22]  Rada Mihalcea,et al.  Linking Documents to Encyclopedic Knowledge , 2008, IEEE Intelligent Systems.

[23]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[24]  Dragomir R. Radev,et al.  Generating Natural Language Summaries from Multiple On-Line Sources , 1998, CL.

[25]  Marius Pasca,et al.  Turning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction , 2008, AAAI.