Topic Models based Personalized Spam Filter

text categorization as the features of text is continuously changing. Spam evolves continuously and makes it difficult for the filter to classify the evolving and evading new feature patterns. Most practical applications are based on online user feedback, the task calls for fast, incremental and robust learning algorithms. This paper presents a system for automatically detection and filtering of unsolicited electronic messages. In this paper, we have developed a content-based classifier, which uses two topic models LSI and PLSA complemented with a text pattern-matching based natural language approach. By combining these powerful statistical and NLP techniques we obtained a parallel content based Spam filter, which performs the filtration in two stages. In the first stage each model generates its individual predictions, which are combined by a voting mechanism as the second stage.