Summarizing Definition from Wikipedia

Wikipedia provides a wealth of knowledge, where the first sentence, infobox (and relevant sentences), and even the entire document of a wiki article could be considered as diverse versions of summaries (definitions) of the target topic. We explore how to generate a series of summaries with various lengths based on them. To obtain more reliable associations between sentences, we introduce wiki concepts according to the internal links in Wikipedia. In addition, we develop an extended document concept lattice model to combine wiki concepts and non-textual features such as the outline and infobox. The model can concatenate representative sentences from non-overlapping salient local topics for summary generation. We test our model based on our annotated wiki articles which topics come from TREC-QA 2004--2006 evaluations. The results show that the model is effective in summarization and definition QA.

[1]  Kathleen F. McCoy,et al.  Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization , 2002, CL.

[2]  ChengXiang Zhai,et al.  Generating Impact-Based Summaries for Scientific Literature , 2008, ACL.

[3]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[4]  Ani Nenkova,et al.  Can You Summarize This? Identifying Correlates of Input Difficulty for Multi-Document Summarization , 2008, ACL.

[5]  Tat-Seng Chua,et al.  Interesting nuggets and their impact on definitional question answering , 2007, SIGIR.

[6]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[7]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[8]  Jimmy J. Lin,et al.  Methods for automatically evaluating answers to complex questions , 2006, Information Retrieval.

[9]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[10]  Xiaojun Wan,et al.  Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction , 2007, ACL.

[11]  Tat-Seng Chua,et al.  Generic soft pattern models for definitional question answering , 2005, SIGIR '05.

[12]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[13]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[14]  Gilad Mishne,et al.  Using Wikipedia at the TREC QA Track , 2004, TREC.

[15]  Tat-Seng Chua,et al.  Document concept lattice for text understanding and summarization , 2007, Inf. Process. Manag..

[16]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[17]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[18]  Lucian Vlad Lita,et al.  Resource Analysis for Question Answering , 2004, ACL.

[19]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.