SMSAssassin: crowdsourcing driven mobile-based system for SMS spam filtering

Due to increase in use of Short Message Service (SMS) over mobile phones in developing countries, there has been a burst of spam SMSes. Content-based machine learning approaches were effective in filtering email spams. Researchers have used topical and stylistic features of the SMS to classify spam and ham. SMS spam filtering can be largely influenced by the presence of regional words, abbreviations and idioms. We have tested the feasibility of applying Bayesian learning and Support Vector Machine(SVM) based machine learning techniques which were reported to be most effective in email spam filtering on a India centric dataset. In our ongoing research, as an exploratory step, we have developed a mobile-based system SMSAssassin that can filter SMS spam messages based on bayesian learning and sender blacklisting mechanism. Since the spam SMS keywords and patterns keep on changing, SMSAssassin uses crowd sourcing to keep itself updated. Using a dataset that we are collecting from users in the real-world, we evaluated our approaches and found some interesting results.