Analyzing tweets to identify malicious messages

With social networking becoming a popular medium, a new frontier of communication begins. Sites like Facebook, Linkedin, and Twitter are changing the way we communicate, often replacing a phone call or an email. In this paper, we will look at detecting spam and phishing over the Twitter network. We argue that spammers and phishers use specific keywords to entice a twitter to click on a link. This link could lead them to a malicious web form. A phishing or spam message has both words and a URL. Twitter is also limited to 140 characters per message. This makes the words used in the message much more important. Bayesian is a popular spam email approach that uses the absence or presence of a word to indicate what to label the message as a whole. We will eliminate Bayesian as a viable option and propose the use of logistic regression model. Current studies place emphases on the follower/followee ratio. We are going to prove that ratio is wrong. Our goal is to effectively detect the presence of spam and try to minimize its influence.