Natural Language Processing Technologies for Multi-Level Intelligent Spam Mail-Filter

To overcome the lack of existing mail filtering system, we designed a content-based message filtering system of multi-level intelligence. Using natural language processing technology, it denotes the E-mail content including attachments. First, it pre-processes the content of E-mail, including segmentation, feature extraction. Second, combining knowledge-base and expansion of the feature, it can form the vector. Corresponding categories vector in the database, two vectors similar degree of calculation determines the credibility of the message. Based on the above theory, with the Java EE 6+SQL Server 2005 platform, a mail filtering system is achieved. It can maximize the elimination of spam. The major features are following: 1) black /white list filtering. It can intercept white list blacklist e-mail messages released. 2) reverse DNS testing. it can effectively eliminate the anonymous e-mail attacks. 3) content-based message filtering. An accurate analysis of mail content can filter out suspicious messages. 4) fingerprint recognition. It can mimic the biological concept of fingerprint identification to complete the identification of spam. 5) user-personalized filtering. The user independently designed filter program. 6) intent detection. It can detect the content URL connection in email. Experiment shows mail filter system can play a very good effect on spam filters.

[1]  Xu Jing-dong Algorithm of Chinese Mail Classification Based on Improved Bayesian Model , 2006 .

[2]  Zhu Jun Design and implementation of Chinese-spam filtering system based on Linux , 2011 .

[3]  Wenhua Liu,et al.  Study on Key Technologies of Generator of Q/A System , 2008, 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application.

[4]  Wilfried N. Gansterer,et al.  A Reliable Component-Based Architecture for E-Mail Filtering , 2007, The Second International Conference on Availability, Reliability and Security (ARES'07).

[5]  Yuan Xiao-jia An algorithm for anonymization of query log , 2013 .

[6]  Yang Lei,et al.  Study on Key Technologies of Generator of Q/A System , 2008, IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application.

[7]  Pan Wen-feng A Survey of Content-based Anti-spam Email Filtering , 2005 .