Apply the Dynamic N-gram to Extract the Keywords of Chinese News

The explosive growth of information on the Internet has created a great demand for new and powerful tools to acquire useful information. The first step to retrieve information form Chinese article is word segmentation. But there are two major segmentation problems that might affect the accuracy of word segmentation performance, ambiguity and long words. In this paper, we propose a novel character-based approach, namely, dynamic N-gram DNG to deal with the two above problems of word segmentation and apply it to Chinese news articles to evaluate the accuracy of N-gram. The evaluation result indicated most of the readers agreed that dynamic N-gram approach could extract meaningful keywords. Even in different news categories, the keywords extraction results still have no significant difference. The primary contribution of this approach is that dynamic N-gram helps us to extract the most meaningful keywords in different types of Chinese articles without considering the number of grams.