Incremental Active Opinion Learning Over a Stream of Opinionated Documents

Applications that learn from opinionated documents, like tweets or product reviews, face two challenges. First, the opinionated documents constitute an evolving stream, where both the author's attitude and the vocabulary itself may change. Second, labels of documents are scarce and labels of words are unreliable, because the sentiment of a word depends on the (unknown) context in the author's mind. Most of the research on mining over opinionated streams focuses on the first aspect of the problem, whereas for the second a continuous supply of labels from the stream is assumed. Such an assumption though is utopian as the stream is infinite and the labeling cost is prohibitive. To this end, we investigate the potential of active stream learning algorithms that ask for labels on demand. Our proposed ACOSTREAM 1 approach works with limited labels: it uses an initial seed of labeled documents, occasionally requests additional labels for documents from the human expert and incrementally adapts to the underlying stream while exploiting the available labeled documents. In its core, ACOSTREAM consists of a MNB classifier coupled with "sampling" strategies for requesting class labels for new unlabeled documents. In the experiments, we evaluate the classifier performance over time by varying: (a) the class distribution of the opinionated stream, while assuming that the set of the words in the vocabulary is fixed but their polarities may change with the class distribution; and (b) the number of unknown words arriving at each moment, while the class polarity may also change. Our results show that active learning on a stream of opinionated documents, delivers good performance while requiring a small selection of labels

[1]  Karl Aberer,et al.  Entity-based Classification of Twitter Messages , 2012, Int. J. Comput. Sci. Appl..

[2]  Nada Lavrac,et al.  Active learning for sentiment analysis on data streams: Methodology and workflow implementation in the ClowdFlows platform , 2015, Inf. Process. Manag..

[3]  Li Shang,et al.  ETree: Effective and Efficient Event Modeling for Real-Time Online Social Media Networks , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Latifur Khan,et al.  Facing the reality of data stream classification: coping with scarcity of labeled data , 2012, Knowledge and Information Systems.

[6]  Mihai Lazarescu,et al.  A Multi-Resolution Learning Approach to Tracking Concept Drift and Recurrent Concepts , 2005, PRIS.

[7]  Roman Garnett,et al.  Bayesian Optimal Active Search and Surveying , 2012, ICML.

[8]  Virgílio A. F. Almeida,et al.  From bias to opinion: a transfer-learning approach to real-time sentiment analysis , 2011, KDD.

[9]  Myra Spiliopoulou,et al.  Probabilistic Active Learning: Towards Combining Versatility, Optimality and Efficiency , 2014, Discovery Science.

[10]  Myra Spiliopoulou,et al.  A Semi-Supervised Self-Adaptive Classifier over Opinionated Streams , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[11]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[12]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[13]  Srinivasan Parthasarathy,et al.  Economically-efficient sentiment stream analysis , 2014, SIGIR.

[14]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[15]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[16]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[17]  Myra Spiliopoulou,et al.  Discovering and monitoring product features and the opinions on them with OPINSTREAM , 2015, Neurocomputing.

[18]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[19]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[20]  Myra Spiliopoulou,et al.  Adaptive semi supervised opinion classifier with forgetting mechanism , 2014, SAC.

[21]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[22]  Marko Grobelnik,et al.  Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Proceedings Part I , 2009 .

[23]  Harry Wechsler,et al.  Query by Transduction , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[25]  Geoff Holmes,et al.  Active Learning with Evolving Streaming Data , 2011, ECML/PKDD.

[26]  Meng Wang,et al.  Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews , 2011, EMNLP.

[27]  João Gama,et al.  Recurrent concepts in data streams classification , 2013, Knowledge and Information Systems.

[28]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[29]  Marie-Francine Moens,et al.  A machine learning approach to sentiment analysis in multilingual Web texts , 2009, Information Retrieval.

[30]  Nada Lavrac,et al.  Stream-based active learning for sentiment analysis in the financial domain , 2014, Inf. Sci..

[31]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.