Design and implementation of text filtering with no semantic accidental injury

Information filtering in Internet refers to finding and filtering the bad words in large-scale web text. The accuracy and efficiency are the main problems of concern. The mixture of Chinese and English text filtering is the research emphasis in this paper. The paper proposes a Chinese and English text filtering algorithm-No Semantic Accidental Injury Filter(NSAIF) algorithm to avoid semantic injury. It's based on Aho-2Corasick (AC) algorithm, but avoids space expansion with dynamic memory allocation. It's applicative for Chinese and English text using one-byte storage. It uses the longest match principle to find the words should be filtered in the trie augmented with failure pointers. It has the good time and space performance in different size of test data sets and has the high theoretical and practical values.