The Challenge of Sentiment Quantification

Among the many challenges that sentiment analysis (SA) faces, I want to concentrate on one which has not received much attention within the SA community, but that is going to play a major role in future applications: Sentiment Quantification (SQ) (Esuli and Sebastiani, 2010). Quantification is defined as the task of estimating the prevalence (i.e., relative frequency) of the classes of interest in a set of unlabelled data via supervised learning (Forman, 2008); examples of SQ are (i) determining the prevalence of endorsements in a set of tweets about a political candidate, or (ii) determining the prevalence of rebuttals in a set of reviews of a given book. A naı̈ve way to tackle quantification is by classifying each unlabelled item independently and computing the fraction of such items that have been attributed the class. However, a good classifier is not necessarily a good quantifier: assuming the binary case, even if (FP +FN) is comparatively small, bad quantification accuracy results if FP and FN are significantly different (since perfect quantification coincides with the case FP = FN ). This has led researchers to study quantification as a task on its own right, rather than as a byproduct of classification. Within SA, quantification plays a major role, since in many applications we are interested in estimating sentiment not at the individual level, but at the aggregate level. For instance, when SA is applied to tweets, it is rarely (if at all) the case that we are interested in the sentiment conveyed by an individual tweet (Gao and Sebastiani, 2015): it is the sentiment of the crowd, and how it is distributed, that

[1]  George Forman,et al.  Quantifying counts and costs via classification , 2008, Data Mining and Knowledge Discovery.

[2]  Wei Gao,et al.  Tweet sentiment: From classification to quantification , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[3]  Andrea Esuli,et al.  Sentiment Quantification , 2010, IEEE Intell. Syst..