Applications of Text Clustering Based on Semantic Body for Chinese Spam Filtering

The effect of spam filtering method based on statistics is not good enough in filtering the new-type spam with synonymous substitution and camouflage, because the method based on statistics ignores the semantic relation between words in the text, and only judges from the word itself. So, a method of spam filtering based on the semantic body is proposed in this paper. The method adopts lexical chain based on HowNet and TFIDF method based on statistics to extract e-mail features, and handle spam with text clustering method. The result of the experiment shows that the new method proposed in this pager provides a good effect in filtering new-type spam.