论文信息 - Topic-centric Classification of Twitter User's Political Orientation

Topic-centric Classification of Twitter User's Political Orientation

In the recent Scottish Independence Referendum (hereafter, IndyRef), Twitter offered a broad platform for people to express their opinions, with millions of IndyRef tweets posted over the campaign period. In this paper, we aim to classify people's voting intentions by the content of their tweets---their short messages communicated on Twitter. By observing tweets related to the IndyRef, we find that people not only discussed the vote, but raised topics related to an independent Scotland including oil reserves, currency, nuclear weapons, and national debt. We show that the views communicated on these topics can inform us of the individuals' voting intentions ("Yes"--in favour of Independence vs. "No"--Opposed). In particular, we argue that an accurate classifier can be designed by leveraging the differences in the features' usage across different topics related to voting intentions. We demonstrate improvements upon a Naive Bayesian classifier using the topics enrichment method. Our new classifier identifies the closest topic for each unseen tweet, based on those topics identified in the training data. Our experiments show that our Topics-Based Naive Bayesian classifier improves accuracy by 7.8% over the classical Naive Bayesian baseline.

[1] Wendy Liu,et al. Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[2] Derek Ruths,et al. Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[3] Jacob Ratkiewicz,et al. Political Polarization on Twitter , 2011, ICWSM.

[4] Dunja Mladenic,et al. Feature Selection for Unbalanced Class Distribution and Naive Bayes , 1999, ICML.

[5] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6] Pablo Barberá. Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data , 2015, Political Analysis.