A classification method for spam e-mail by Self-Organizing Map and automatically defined groups

We have some difficulties in E-mails as a communication tool, because the number of E-mails infected with virus and/or recognized as Spam increases. Some E-mail filter softwares removes such problematic ones. However, we may mett the misjudgements for the filtering the Spam E-mail, even if the E-mail is important and then we cannot receive it. In this paper, we propose a classification method for Spam E-mail based on the results of SpamAssassin, which is the open source software to identify spam signatures. This method can learn patterns of Spam E-mails and Ham ones and correctly recognize them. First, the method divides E- mails into some categories by Self-Organizing Map(SOM) and extracts the correct judgement rules by Automatically Defined Groups(ADGs), even if the results by SpamAssassin are wrong. In order to verify the effectiveness of our proposed method, we examined approximately 3,000 E-mails.