Experimenting with Distant Supervision for Emotion Classification

We describe a set of experiments using automatically labelled data to train supervised classifiers for multi-class emotion detection in Twitter messages with no manual intervention. By cross-validating between models trained on different labellings for the same six basic emotion classes, and testing on manually labelled data, we conclude that the method is suitable for some emotions (happiness, sadness and anger) but less able to distinguish others; and that different labelling conventions are more suitable for some emotions than others.

[1]  P. Ekman Universals and cultural differences in facial expressions of emotion. , 1972 .

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  Joseph B. Walther,et al.  The Impacts of Emoticons on Message Interpretation in Computer-Mediated Communication , 2001 .

[4]  Ze-Jing Chuang,et al.  Multi-Modal Emotion Recognition from Speech and Text , 2004, ROCLING/IJCLCLP.

[5]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[6]  Yuki Tanaka,et al.  Extraction and classification of facemarks , 2005, IUI '05.

[7]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[8]  Yuki Tanaka,et al.  Extraction and Classification of Facemarks with Kernel Methods , 2005 .

[9]  R. Provine,et al.  Emotional Expression Online , 2007 .

[10]  Daantje Derks,et al.  Emoticons and Online Message Interpretation , 2008 .

[11]  T. Danisman,et al.  Feeler: Emotion Classification of Text Using Vector Space Model , 2008 .

[12]  Yong-Soo Seol,et al.  Emotion Recognition from Text Using Knowledge-based ANN , 2008 .

[13]  DAANTJE DERKS,et al.  Emoticons in Computer-Mediated Communication: Social Motives and Social Context , 2008, Cyberpsychology Behav. Soc. Netw..

[14]  Filip Radulovic,et al.  Smiley Ontology , 2009 .

[15]  Vikas Sindhwani,et al.  Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria , 2009, HLT-NAACL 2009.

[16]  Timothy W. Finin,et al.  Delta TFIDF: An Improved Feature Space for Sentiment Analysis , 2009, ICWSM.

[17]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[18]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[19]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[20]  Pawel Dybala,et al.  CAO: A Fully Automatic Emoticon Analysis System Based on Theory of Kinesics , 2010, IEEE Transactions on Affective Computing.

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.