Twitterati Identification System

Abstract Twitter is an online service playing dual roles of social networking and micro blogging. Communication with other twitter users is carried out by publishing text and media based posts called tweets. Lately, Twitter has attracted a large number of automated programs, known as bots. Generally bots are used to generate a large amount of benign tweets delivering news and updating feeds, whereas some bots are being created to spread spam or malicious contents. To assist human users in identifying who they are communicating with, this project focuses on the classification of human and bot accounts on Twitter. We collected twitter statistics of a number of twitter users, their tweets, bot tweets, features, characteristics, etc. The data is then analyzed based on statistics to create a known training set of bots and humans. The proposed classification system uses a number of twitter attributes where every stage of the system makes a decision about the users of Twitter. Based on the statistical training data a decision tree is generated. Rules are formed using the decision tree to detect the user of twitter as a human or a bot. The various properties based on twitter features help distinguishing a human from a bot are discussed and implemented in this paper. Based on the results obtained it can be concluded that the more number of attributes, better is the detection mechanism. The statistical training data set is consistent for varying sizes of the test data.