Two Phase Approach for Spam-Mail Filtering

This paper describes a two-phase method for filtering spam mails based on textual information and hyperlinks. Since the body of a spam mail has little text information, it provides insufficient hints to distinguish spam mails from legitimate mails. To resolve this problem, we follows hyperlinks contained in the email body, fetches contents of a remote webpage, and extracts hints (i.e., features) from original email body and fetched webpages. We divided hints into two kinds of information: definite information and less definite textual information. In our experiment, the method of fetching web pages achieved an improvement of F-measure by 9.4% over the method of using an original email header and body only.