Using Random Walks for Question-focused Sentence Retrieval

We consider the problem of question-focused sentence retrieval from complex news articles describing multi-event stories published over time. Annotators generated a list of questions central to understanding each story in our corpus. Because of the dynamic nature of the stories, many questions are time-sensitive (e.g. "How many victims have been found?") Judges found sentences providing an answer to each question. To address the sentence retrieval problem, we apply a stochastic, graph-based method for comparing the relative importance of the textual units, which was previously used successfully for generic summarization. Currently, we present a topic-sensitive version of our method and hypothesize that it can outperform a competitive baseline, which compares the similarity of each sentence to the input question via IDF-weighted word overlap. In our experiments, the method achieves a TRDR score that is significantly higher than that of the baseline.

[1]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[2]  Harris Wu,et al.  Probabilistic question answering on the Web: Research Articles , 2005 .

[3]  Harris Wu,et al.  Toward Answer-Focused Summarization Using Search Engines , 2004, New Directions in Question Answering.

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Harris Wu,et al.  Probabilistic question answering on the web , 2002, WWW '02.

[6]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[7]  James Allan,et al.  Retrieval and novelty detection at the sentence level , 2003, SIGIR.

[8]  Valerie Isham,et al.  Non‐Negative Matrices and Markov Chains , 1983 .

[9]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[10]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Evaluation , 2000, TREC.

[11]  Robert J. Gaizauskas,et al.  Information retrieval for question answering a SIGIR 2004 workshop , 2004, SIGF.

[12]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[13]  Julio Gonzalo,et al.  An Empirical Study of Information Synthesis Task , 2004, ACL.

[14]  Oren Kurland,et al.  PageRank without hyperlinks: structural re-ranking using links induced by language models , 2005, SIGIR '05.

[15]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[16]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.