Efficient Density Based Clustering of Tweets and Sentimental Analysis Based on Segmentation

Twitter has become popular social networking site where users share their up-to-date information. The error-prone and short nature of tweets makes the word-based representation less reliable. Tweet segmentation is the process of splitting tweets into meaning segments so that its semantic meaning is well conserved and is easy to be used by downstream applications. Segmentation is done based on stickiness score considering both global and local context. Clustering of tweets are done using DBSCAN method with Jaccard Coefficient as the similarity measure. The sentimental variations in tweets are measured based on segmentation. The experimental evaluation shows that the global terms using wikilinks are more efficient than the normal segmentation. Clustering is more effective using DBSCAN algorithm, which is best for uncertain data.