Economically-efficient sentiment stream analysis

Text-based social media channels, such as Twitter, produce torrents of opinionated data about the most diverse topics and entities. The analysis of such data (aka. sentiment analysis) is quickly becoming a key feature in recommender systems and search engines. A prominent approach to sentiment analysis is based on the application of classification techniques, that is, content is classified according to the attitude of the writer. A major challenge, however, is that Twitter follows the data stream model, and thus classifiers must operate with limited resources, including labeled data and time for building classification models. Also challenging is the fact that sentiment distribution may change as the stream evolves. In this paper we address these challenges by proposing algorithms that select relevant training instances at each time step, so that training sets are kept small while providing to the classifier the capabilities to suit itself to, and to recover itself from, different types of sentiment drifts. Simultaneously providing capabilities to the classifier, however, is a conflicting-objective problem, and our proposed algorithms employ basic notions of Economics in order to balance both capabilities. We performed the analysis of events that reverberated on Twitter, and the comparison against the state-of-the-art reveals improvements both in terms of error reduction (up to 14%) and reduction of training resources (by orders of magnitude).

[1]  N. Kaldor The Philosophy of Economics: Welfare Propositions of Economics and Interpersonal Comparisons of Utility , 1939 .

[2]  Icinqsley Laffer. THE FOUNDATIONS OF WELFARE ECONOMICS , 1951 .

[3]  Algorithm 235: Random permutation , 1964, CACM.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[6]  Ivan Koychev,et al.  Gradual Forgetting for Adaptation to Concept Drift , 2000 .

[7]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  Bart Goethals,et al.  FIMI'03: Workshop on Frequent Itemset Mining Implementations , 2003 .

[9]  Srinivasan Parthasarathy,et al.  Parallel and Distributed Frequent Itemset Mining on Dynamic Datasets , 2003, HiPC.

[10]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[11]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[14]  Mohammed J. Zaki,et al.  Lazy Associative Classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[15]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[16]  Rafael Morales Bueno,et al.  Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners , 2007, J. Mach. Learn. Res..

[17]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[18]  Bhavani M. Thuraisingham,et al.  A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  Mohammed J. Zaki,et al.  Calibrated Lazy Associative Classification , 2008, SBBD.

[20]  Jeffrey Xu Yu,et al.  Sliding-window top-k queries on uncertain streams , 2008, The VLDB Journal.

[21]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[22]  João Gama,et al.  Issues in evaluation of stream learning algorithms , 2009, KDD.

[23]  Geoff Holmes,et al.  Fast Perceptron Decision Tree Learning from Evolving Data Streams , 2010, PAKDD.

[24]  Žliobait . e,et al.  Learning under Concept Drift: an Overview , 2010 .

[25]  Xiaodong Lin,et al.  Active Learning From Stream Data Using Optimal Weight Classifier Ensemble , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Indre Zliobaite,et al.  Learning under Concept Drift: an Overview , 2010, ArXiv.

[27]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[28]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[29]  Jesús S. Aguilar-Ruiz,et al.  Classification model for data streams based on similarity , 2011, IEA/AIE'11.

[30]  Wagner Meira,et al.  Effective sentiment stream analysis with self-augmenting training and demand-driven projection , 2011, SIGIR.

[31]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[32]  Geoff Holmes,et al.  Detecting Sentiment Change in Twitter Streaming Data , 2011, WAPA.

[33]  Geoff Holmes,et al.  MOA Concept Drift Active Learning Strategies for Streaming Data , 2011, WAPA.

[34]  Geoff Holmes,et al.  Active Learning with Evolving Streaming Data , 2011, ECML/PKDD.

[35]  Geoff Holmes,et al.  Ensembles of Restricted Hoeffding Trees , 2012, TIST.

[36]  Adriano Veloso,et al.  Pareto-efficient hybridization for multi-objective recommender systems , 2012, RecSys.

[37]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[38]  L. Blume,et al.  The New Palgrave Dictionary of Economics, 2nd edition , 2008 .

[39]  Jefersson Alex dos Santos,et al.  Learning to Rank Similar Apparel Styles with Economically-Efficient Rule-Based Active Learning , 2014, ICMR.

[40]  William Simkulet The Compensation Principle , 2015 .