SentiCorr: Multilingual Sentiment Analysis of Personal Correspondence

We present the system for automated sentiment analysis on multilingual user generated content from various social media and e-mails. One of the main goals of the system is to make people aware how much positive and negative content they read and write. The output is summarized into a database allowing for basic OLAP style exploration of the data across basic dimensions including for example time and correspondents dimensions. The sentiment analysis is based on a four-step approach including language identification for short texts, part-of-speech tagging, subjectivity detection and polarity detection techniques. We extensively tested our system on data from Twitter, Face book and Hyves. We also developed an MS Outlook sentiment analysis plug-in allowing people to see how positive or negative the content of the e-mails is and provide confirmatory or correcting feedback on the correctness of the sentiment classification at the sentence or e-mail level.