Web-Based Verification on the Representativeness of Terms Extracted from Single Short Documents

Single document summarization is useful for extracting the major ideas from huge amount of daily information. However, it's a challenge to distinguish the relative importance among terms. In this paper, we propose a Web-based approach to term verification. Search-results of extracted terms are utilized as their expanded representation, and their similarity with the original document are calculated as an estimate of term representative ness. We experimented with term extraction methods on multilingual news extracts and compared the effectiveness of term verification with various Jaccard similarity measures. The experimental results show the feasibility of Web-based verification on the representativeness of extracted terms.

[1]  Lucy Vanderwende,et al.  Enhancing Single-Document Summarization by Combining RankNet and Third-Party Sources , 2007, EMNLP.

[2]  Ani Nenkova,et al.  Automatic Text Summarization of Newswire: Lessons Learned from the Document Understanding Conference , 2005, AAAI.

[3]  Pu-Jen Cheng,et al.  Translating unknown cross-lingual queries in digital libraries using a Web-based approach , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[4]  Xiaojun Wan,et al.  Single Document Summarization with Document Expansion , 2007, AAAI.

[5]  Susan T. Dumais,et al.  Similarity Measures for Short Segments of Text , 2007, ECIR.

[6]  Lee-Feng Chien,et al.  PAT-tree-based keyword extraction for Chinese information retrieval , 1997, SIGIR '97.

[7]  Shui-Lung Chuang,et al.  Liveclassifier: creating hierarchical text classifiers through web corpora , 2004, WWW '04.

[8]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[9]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[10]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[11]  José Gabriel Pereira Lopes,et al.  Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units , 1999, EPIA.

[12]  Christof Monz,et al.  Automatic Single-Document Key Fact Extraction from Newswire Articles , 2009, EACL.