Estimating Importance Features for Fact Mining (With a Case Study in Biography Mining)

We present a transparent model for ranking sentences that incorporates topic relevance as well as an aboutness and importance feature. We describe and compare five methods for estimating the importance feature. The two key features that we use are graph-based ranking and ranking based on reference corpora of sentences known to be important. Independently those features do not improve over the baseline, but combined they do. While our experimental evaluation focuses on informational queries about people, our importance estimation methods are completely general and can be applied to any topic.

[1]  Philip Smith Roger Bakeman John M. Gottman , 1987, Animal Behaviour.

[2]  James Allan,et al.  Retrieval and novelty detection at the sentence level , 2003, SIGIR.

[3]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4]  Myung-Gil Jang,et al.  Descriptive Question Answering in Encyclopedia , 2005, ACL.

[5]  Valentin Jijkoun,et al.  Information Extraction for Question Answering: Improving Recall Through Syntactic Patterns , 2004, COLING.

[6]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[7]  William John Teahan,et al.  Bangor at TREC 2004: Question Answering Track , 2004, TREC.

[8]  Valentin Jijkoun,et al.  Overview of WiQA 2006 , 2006 .

[9]  Jinxi Xu,et al.  TREC 2003 QA at BBN: Answering Definitional Questions , 2003, TREC.

[10]  Eduard H. Hovy,et al.  Offline Strategies for Online Question Answering: Answering Questions Before They Are Asked , 2003, ACL.

[11]  Ion Androutsopoulos,et al.  A Practically Unsupervised Learning Method to Identify Single-Snippet Answers to Definition Questions on the Web , 2005, HLT/EMNLP.

[12]  Eric Brill,et al.  Automatic Question Answering: Beyond the Factoid , 2004, NAACL.

[13]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[14]  Elena Filatova,et al.  Tell Me What You Do and I'll Tell You What You Are: Learning Occupation-Related Activities for Biographies , 2005, HLT/EMNLP.

[15]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[16]  Inderjeet Mani,et al.  Producing Biographical Summaries: Combining Linguistic Knowledge with Corpus Statistics , 2001, ACL.

[17]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[18]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[19]  Donna K. Harman,et al.  Novelty Detection: The TREC Experience , 2005, HLT.

[20]  Gilad Mishne,et al.  Using Wikipedia at the TREC QA Track , 2004, TREC.