论文信息 - The Web Information Extraction for Update Summarization Based on Shallow Parsing

The Web Information Extraction for Update Summarization Based on Shallow Parsing

Traditional text information extraction methods mainly act on static documents and are difficult to reflect the dynamic evolvement of information update on the web. To address this challenge, this work proposes a new method based on shallow parsing with rules. The rules are generated according to the syntactic features of English texts, such as the tense of verbs, the usages of modal verbs and so on. The latest novel information in English news texts is extracted correctly, to meet the needs of users for accessing to updated information of the developing events quickly and effectively. Performance results show the improvement of the proposed scheme in this work.

[1] Furu Wei,et al. PNR2: Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization , 2008, COLING.

[2] Eduard Hovy,et al. Assigning Time-Stamps to Event-Clauses , 2001, The Language of Time - A Reader.

[3] Florian Boudin,et al. Improving Update Summarization by Revisiting the MMR Criterion , 2010, ArXiv.

[4] Liu Yu-shu. Automatic Multidocument Summarization Based on Time Stamp , 2007 .

[5] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.

[6] Xiaojun Wan. TimedTextRank: adding the temporal dimension to multi-document summarization , 2007, SIGIR.