The Web Information Extraction for Update Summarization Based on Shallow Parsing

Traditional text information extraction methods mainly act on static documents and are difficult to reflect the dynamic evolvement of information update on the web. To address this challenge, this work proposes a new method based on shallow parsing with rules. The rules are generated according to the syntactic features of English texts, such as the tense of verbs, the usages of modal verbs and so on. The latest novel information in English news texts is extracted correctly, to meet the needs of users for accessing to updated information of the developing events quickly and effectively. Performance results show the improvement of the proposed scheme in this work.