Ask the Crowd to Find out What's Important

We present a corpus-based method for estimating the im- portance of sentences. Our main contribution is two-fold. First, we introduce the idea of using the increasing amount of manually labeled category information (that is becoming available through collaborative knowledge creation efforts) to identify "typical information" for categories of entities. Second, we provide multiple types of empirical evidence for the usefulness of this notion of typical-information-for-a- category for estimating the importance of sentences.

[1]  Wessel Kraaij,et al.  Language Models for Topic Tracking , 2003 .

[2]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[3]  Lucian Vlad Lita,et al.  Resource Analysis for Question Answering , 2004, ACL.

[4]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[5]  Alison Wray Formulaic Language and the Lexicon: Formulaic Language and the Lexicon , 2002 .

[6]  Inderjeet Mani,et al.  Producing Biographical Summaries: Combining Linguistic Knowledge with Corpus Statistics , 2001, ACL.

[7]  V. Zlatic,et al.  Wikipedias: collaborative web-based encyclopedias as complex networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  W. Bruce Croft,et al.  Similarity measures for tracking information flow , 2005, CIKM '05.

[9]  Donna K. Harman,et al.  Novelty Detection: The TREC Experience , 2005, HLT.

[10]  Ludovic Denoyer,et al.  The Wikipedia XML Corpus , 2006, INEX.

[11]  James Allan,et al.  Retrieval and novelty detection at the sentence level , 2003, SIGIR.

[12]  Valentin Jijkoun,et al.  Overview of WiQA 2006 , 2006 .

[13]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[14]  Gary C. Borchardt,et al.  External Knowledge Sources for Question Answering , 2005, TREC.

[15]  Gilad Mishne,et al.  Using a Reference Corpus as a User Model for Focused Information Retrieval , 2005, J. Digit. Inf. Manag..

[16]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[17]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[18]  T. V. Dijk News as Discourse , 1990 .

[19]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[20]  Z. Harris A Theory of Language and Information: A Mathematical Approach , 1991 .