The rapid growing World Wide Web (WWW) is no longer a passive information provider. Nowadays, Internet users themselves have become contributors of the WWW. A lot of user generated data, along with non-user-generated data, make our world an informative, however, perhaps over-informed society. Thus, the increasing amount of unorganized, randomly generated data drove the momentum of big data analysis, aiming to discover and learn the hidden patterns behind the data. In this thesis, we will look at two problems of mining knowledge from data. In the first project, we develop and test a new methods to classify “democrats” and “republicans” in Twitter, with the help of sentiment analysis techniques. This sentiment-based classification model is proposed in order to address the problem that conventional quantitative features, such as tweet count, follower-to-following ratio, election tweet count, cannot predict the opinion alignment of Tweeters. Therefore we utilize sentiment features upon events, topics and then feed the sentiment feature vectors into the classification model. The result shows that, even only with contextual information of tweets, sentiment-based classification model performs as good as conventional classification model. In addition, in order to automate the process of topic selection, we propose and test the scheme to rank topics in terms of their degree of polarity, so as to pick those that distinguishing Tweeters the most. Finally, we propose using social relationship graph information to adjust the sentiment vector before feeding them into the classification model. Surprisingly, the graph-adjusted sentiment-based classification model can achieve an accuracy higher than 80 percent in classification. At last, we compare our graph based sentiment classification model and Belief Propagation (BP) model on our dataset and discuss the effects of both
[1]
Ali Cevat Tasiran.
University rankings: theoretical basis, methodology and impacts on global higher education
,
2012
.
[2]
Isabell M. Welpe,et al.
Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment
,
2010,
ICWSM.
[3]
Chao Yang,et al.
CATS: Characterizing automation of Twitter spammers
,
2013,
2013 Fifth International Conference on Communication Systems and Networks (COMSNETS).
[4]
Murphy Choy,et al.
A sentiment analysis of Singapore Presidential Election 2011 using Twitter data with census correction
,
2011,
ArXiv.
[5]
Vaibhavi N Patodkar,et al.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
,
2016
.
[6]
Loet Leydesdorff,et al.
How to evaluate universities in terms of their relative citation impacts: Fractional counting of citations and the normalization of differences among disciplines
,
2010,
J. Assoc. Inf. Sci. Technol..
[7]
Rebeka Lukman,et al.
University ranking using research, educational and environmental indicators
,
2010
.
[8]
P. Metaxas,et al.
Social Media and the Elections
,
2012,
Science.
[9]
Guofei Gu,et al.
Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter
,
2012,
WWW.
[10]
Rajeev Motwani,et al.
The PageRank Citation Ranking : Bringing Order to the Web
,
1999,
WWW 1999.
[11]
Bing Liu,et al.
Opinion observer: analyzing and comparing opinions on the Web
,
2005,
WWW '05.
[12]
Lillian Lee,et al.
Opinion Mining and Sentiment Analysis
,
2008,
Found. Trends Inf. Retr..