Talaia: a Real time Monitor of Social Media and Digital Press

Talaia is a platform for monitoring social media and digital press. A configurable crawler gathers content with respect to user defined domains or topics. Crawled data is processed by means of IXA-pipes NLP chain and EliXa sentiment analysis system. A Django powered interface provides data visualization to provide the user analysis of the data. This paper presents the architecture of the system and describes in detail the different components of the system. To prove the validity of the approach, two real use cases are accounted for, one in the cultural domain and one in the political domain. Evaluation for the sentiment analysis task in both scenarios is also provided, showing the capacity for domain adaptation.

[1]  Jeannie A. Stamberger,et al.  Crowd sentiment detection during disasters and crises , 2012, ISCRAM.

[2]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[3]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[4]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[5]  German Rigau,et al.  IXA pipeline: Efficient and Ready to Use Multilingual NLP tools , 2014, LREC.

[6]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[7]  Lise Getoor,et al.  Supervised and Unsupervised Methods in Employing Discourse Relations for Improving Opinion Polarity Classification , 2009, EMNLP.

[8]  Mike Thelwall,et al.  The Heart and Soul of the Web? Sentiment Strength Detection in the Social Web with SentiStrength , 2017 .

[9]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Yiannis Kompatsiaris,et al.  Sensing Trending Topics in Twitter , 2013, IEEE Transactions on Multimedia.

[12]  Alessandro Moschitti,et al.  UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification , 2015, *SEMEVAL.

[13]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[14]  Veselin Stoyanov,et al.  Evaluation Measures for the SemEval-2016 Task 4 “Sentiment Analysis in Twitter” (Draft: Version 1.13) , 2016 .

[15]  Leandro Nunes de Castro,et al.  A keyword extraction method from twitter messages represented as graphs , 2014, Appl. Math. Comput..

[16]  Suresh Manandhar,et al.  SemEval-2015 Task 12: Aspect Based Sentiment Analysis , 2015, *SEMEVAL.

[17]  Mathieu Cliche,et al.  BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs , 2017, *SEMEVAL.

[18]  Thomas Hofmann,et al.  Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification , 2017, WWW.

[19]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[20]  Fatiha Sadat,et al.  Named Entity Recognition and Hashtag Decomposition to Improve the Classification of Tweets , 2016, NUT@COLING.

[21]  Julio Villena-Román,et al.  TASS 2014 - The Challenge of Aspect-based Sentiment Analysis , 2015, Proces. del Leng. Natural.

[22]  Preslav Nakov,et al.  SemEval-2014 Task 9: Sentiment Analysis in Twitter , 2014, *SEMEVAL.

[23]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[24]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[25]  David A. Shamma,et al.  Peaks and persistence: modeling the shape of microblog conversations , 2011, CSCW '11.

[26]  Benno Stein,et al.  Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter , 2017, CLEF.

[27]  Arkaitz Zubiaga,et al.  TweetNorm_es: an annotated corpus for Spanish microtext normalization , 2014, LREC.

[28]  Kai Chen,et al.  Cost-Effective Online Trending Topic Detection and Popularity Prediction in Microblogging , 2016, ACM Trans. Inf. Syst..

[29]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[30]  Iñaki San Vicente,et al.  TASS: Detecting Sentiments in Spanish Tweets , 2012 .

[31]  Trevor Cohn,et al.  A temporal model of text periodicities using Gaussian Processes , 2013, EMNLP.

[32]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[33]  Rodrigo Agerri,et al.  EliXa: A Modular and Flexible ABSA Platform , 2015, *SEMEVAL.

[34]  Tong Zhang,et al.  Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings , 2016, ICML.

[35]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[36]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[37]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[38]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[39]  Arkaitz Zubiaga,et al.  Towards Real-Time, Country-Level Location Classification of Worldwide Tweets , 2016, IEEE Transactions on Knowledge and Data Engineering.

[40]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[41]  Elhuyar Fundazioa,et al.  Elhuyar at TweetNorm 2013 , 2013 .

[42]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[43]  Haizhou Li,et al.  Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling for Dialogue Topic Tracking , 2016, ACL.

[44]  Geert-Jan Houben,et al.  Twitcident: fighting fire with information from social web streams , 2012, WWW.

[45]  Suresh Manandhar,et al.  SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[46]  Theodoros Tzouramanis,et al.  A robust gender inference model for online social networks and its application to LinkedIn and Twitter , 2014, First Monday.

[47]  Nicholas Diakopoulos,et al.  Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs , 2011, EMNLP.

[48]  Fei Liu,et al.  A Broad-Coverage Normalization System for Social Media Language , 2012, ACL.

[49]  Seong Joon Yoo,et al.  Hot topic detection and technology trend tracking for patents utilizing term frequency and proportional document frequency and semantic information , 2016, 2016 International Conference on Big Data and Smart Computing (BigComp).

[50]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.