Semi-Automatic Training Set Construction for Supervised Sentiment Analysis in Polarized Contexts

Standard sentiment analysis techniques usually rely either on sets of rules based on semantic and affective information or in machine learning approaches whose quality heavily depend on the size and significance of a training set of pre-labeled text samples. In many situations, this labeling needs to be performed by hand, potentially limiting the size of the training set. In order to address this issue, in this work we propose a methodology to retrieve text samples from Twitter and automatically label them. Additionally, we also tackle the situation in which the base rates of positive and negative sentiment samples in the training and test sets are biased with respect to the system in which the classifier is intended to be applied.

[1]  Julio Villena-Román,et al.  TASS 2015 - The Evolution of the Spanish Opinion Mining Systems , 2016, Proces. del Leng. Natural.

[2]  J. C. Losada,et al.  Multiple leaders on a multilayer social media , 2015 .

[3]  S. Aguilera,et al.  Measuring squid fishery governance efficacy: A social-ecological system analysis , 2018, International Journal of the Commons.

[4]  Yunfang Chen,et al.  A survey on sentiment analysis by using machine learning methods , 2017, 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC).

[5]  Ingmar Weber,et al.  Predicting ideological friends and foes in Twitter conflicts , 2014, WWW.

[6]  Vadlamani Ravi,et al.  A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..

[7]  Victoria Bobicev,et al.  Inter-Annotator Agreement in Sentiment Analysis: Machine Learning Perspective , 2017, RANLP.

[8]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[9]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .

[10]  F. Guerrero-Solé Community Detection in Political Discussions on Twitter , 2017 .

[11]  Lluís Padró,et al.  FreeLing 1.3: Syntactic and semantic services in an open-source NLP library , 2006, LREC.

[12]  Astrid Barrio,et al.  Reducing the gap between leaders and voters? Elite polarization, outbidding competition, and the rise of secessionism in Catalonia , 2017 .

[13]  Jacob Ratkiewicz,et al.  Political Polarization on Twitter , 2011, ICWSM.

[14]  Mirna Adriani,et al.  Sentiment Lexicon Generation for an Under-Resourced Language , 2014, Int. J. Comput. Linguistics Appl..

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Javier Borondo,et al.  Opinion Polarization during a Dichotomous Electoral Process , 2019, Complex..

[17]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[18]  R. Menezes,et al.  Football Conversations: What Twitter Reveals about the 2014 World Cup , 2015 .

[19]  André Freitas,et al.  A Twitter Sentiment Gold Standard for the Brexit Referendum , 2016, SEMANTICS.

[20]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[21]  Vijayalakshmi Atluri,et al.  Analysis of political discourse on twitter in the context of the 2016 US presidential elections , 2017, Gov. Inf. Q..

[22]  E. Hargittai,et al.  Cross-ideological discussions among conservative and liberal bloggers , 2007 .

[23]  Juan Carlos Losada,et al.  Recurrent Patterns of User Behavior in Different Electoral Campaigns: A Twitter Analysis of the Spanish General Elections of 2015 and 2016 , 2018, Complex..

[24]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[25]  Mirna Adriani,et al.  Automatically Building a Corpus for Sentiment Analysis on Indonesian Tweets , 2014, PACLIC.

[26]  Dan Mercea,et al.  The Brexit Botnet and User-Generated Hyperpartisan News , 2017 .

[27]  Ivan Serrano Just a Matter of Identity? Support for Independence in Catalonia , 2013 .

[28]  Filippo Menczer,et al.  Partisan asymmetries in online political activity , 2012, EPJ Data Science.

[29]  Daniel Gayo-Avello,et al.  A Meta-Analysis of State-of-the-Art Electoral Prediction From Twitter Data , 2012, ArXiv.

[30]  A. J. Morales,et al.  Measuring Political Polarization: Twitter shows the two sides of Venezuela , 2015, Chaos.

[31]  Yogesh Kumar Dwivedi,et al.  Polarization and acculturation in US Election 2016 outcomes – Can twitter analytics predict changes in voting preferences , 2019, Technological Forecasting and Social Change.

[32]  Guido Caldarelli,et al.  Mapping social dynamics on Facebook: The Brexit debate , 2017, Soc. Networks.

[33]  A. J. Morales,et al.  Characterizing and modeling an electoral campaign in the context of Twitter: 2011 Spanish Presidential Election as a case study , 2012, Chaos.

[34]  Kent A. Spackman,et al.  Signal Detection Theory: Valuable Tools for Evaluating Inductive Learning , 1989, ML.

[35]  David Martí,et al.  The 2015 Catalan election: a de facto referendum on independence? , 2016 .

[36]  Sean Wallis,et al.  Binomial Confidence Intervals and Contingency Tests: Mathematical Fundamentals and the Evaluation of Alternative Methods , 2013, J. Quant. Linguistics.

[37]  John H. Parmelee Political journalists and Twitter: Influences on norms and practices , 2013 .

[38]  Marc Esteve Del Valle,et al.  Echo Chambers in Parliamentary Twitter Networks: The Catalan Case , 2018 .