Classification of Emotions in Internet Chat: An Application of Machine Learning Using Speech Phonemes

This article reports our progress in the classification of expressions of emotion in network-based chat conversations. Emotion detection of this nature is currently an active area of research [8] [9]. We detail a linguistic approach to the tagging of chat conversation with appropriate emotion tags. In our approach, textual chat messages are automatically converted into speech and then instance vectors are generated from frequency counts of speech phonemes present in each message. In combination with other statistically derived attributes, the instance vectors are used in various machine-learning frameworks to build classifiers for emotional content. Based on the standard metrics of precision and recall, we report results exceeding 90% accuracy when employing k-nearest-neighbor learning. Our approach has thus shown promise in discriminating emotional from non-emotional content in independent testing. 1. INTRODUCTION With the information revolution well under way, the degree of communication and number of communication methods is growing rapidly. People converse frequently via a number of mediums. One such medium is Internet chat using various instant messaging clients (e.g., AOL Instant Messenger, MSN Messenger, etc). These communications provide an excellent platform to perform research on informal communications. One such research area that has gained much interest recently is that of tagging the emotion content in informal conversation. There are several important applications of such tagging. One such application is related to homeland defense. In fact, the research reported herein was motivated by a chat-mining research project we conducted at the behest of the Intelink intelligence network. Intelink is a secure military communications channel used by the US government for critical exchange of information. Intelink’s goal is to monitor chat conversation over the network and map relationships between participants and their topics of conversation to determine the appropriateness of usage and the effectiveness of the communication network. They are interested in information such as the frequency of employee communication, the topics discussed, conversational participants and the emotional tone and focus of the conversations. Such information can be modeled using social and semantic networks constructed from chat data [5] [12]. 1 NSF Grant Number EIA-0070457, Division of Experimental & Integrative Activities. A second important application of emotion tags is the ability to add user feedback to existing instant messaging services. Based on the textual messages produced by a chat participant, an icon or display can represent users’ current emotional state as inferred by the system [2]. In addition, the use of emotion detection in user interface design is currently an area of active research. This research is part of the field of Affective Computing first described by Picard [9]. A number of textual, verbal, and non-verbal methods have been applied to allow interfaces and systems to adapt to the emotional state of the user. This area has sparked interest in many fields of research, exciting even those such as Marvin Minsky who has recently written a book on the subject [8]. This paper details a linguistic approach to identifying emotions in chat data. The approach centers on the reproduction of speech from the textual messages logged from an instant messaging client. Speech phonemes present in the message are then modeled to identify the emotion expressed. This approach offers a number of advantages over more complex methods. The primary advantage is that this method degrades gracefully. Many messages in chat data contain misspellings, do not adhere to grammar rules, and may not contain complete words. We have shown that our approach is robust in the presence of such noise. The remainder of this paper details our research in the detection of emotion in chat data. In section two, we review related work. In section three, we present our approach. Following this in section four we discuss and provide a detailed analysis of our experimental results. In section five we identify future work, and draw conclusions in section six. We close in section seven with acknowledgements of others who have aided us in this research. 2. RELATED WORK Much work has been done in the identification of emotion for various applications. The study of emotion for Affective Computing alone is a rather large field. The approaches that exist for lingual emotion detection can be broken into one of a few categories. These categories are: Non-verbal, Semantic, and Symbolic. 2.1 Non-verbal This category of emotion detection focuses on spoken language. This approach analyzes the characteristics of the speech such as prosody and spectral information [10]. Using features of spoken language such as the aforementioned ones has proven very successful in identifying emotion in spoken language. This category is the most intuitive. We have all experienced instances where the “tone” of someone’s voice indicates to us the emotion they are expressing. Studies in this category are, in general similar in form to the study reported in this paper. They use a training corpus of sound clips and have domain experts (which in this case are just people fluent in English) declare the emotion class of each sound clip. They then build models out of the non-verbal attributes of the sound clips and test the performance of these models. While approaches in this category have shown to be very successful they are not easily useful in the domain of chat conversation. There is no non-verbal information associated with the messages that are sent. Furthermore, spoken language is by nature more accurate than typed messages are. Thus, methods to recover from misspoken words are not necessary and would not substantially affect the results. 2.2 Semantic This category of emotion detection uses an approach that attempts to understand the underlying semantics of language to determine the emotion class. This approach is more complex than the approach described above because it relies upon ability to recognize certain key words in the language. One recent and powerful implementation of this method can be found in Liu, Lieberman, and Selker [6]. This approach uses a large real-world commonsense database to form semantic notions about the story being told in a given sample of natural language. In this case, they integrated the system into an email client to provide feedback to a user. In the approach affective sentences are extracted from the database. A number of models each representing a different emotion are built from these sentences and the models compete to identify an emotion class for the text. The final models are used to identify the emotion of segments in the text and thus tag them with their proper emotion class. This method is very robust and attempts to dig deeply into the semantic notion of emotion. This method does rely upon a certain quantity of correct text, though. If the text is malformed or very short it would be difficult to create the models necessary to this approach. Furthermore, this method explicitly chooses to ignore some of the statistical features that show the relationship between emotion and language. 2.3 Symbolic This category of emotion detection focuses on techniques that attempt to discover patterns in the text that allow emotion tagging. These approaches rely upon syntactical and lexical analyses of the text to discover the emotion class. They attempt to discover clauses and words that identify certain attributes of emotions. They then build models to use these attributes to recover the emotion class. One such application of this approach can be found in Boucouvalas and Zhe [2]. Their approach utilizes a tagged dictionary to identify the basis of emotion in phrases. It then uses various grammatical features of the phrases to deduce which of the tagged words carries the correct emotion class. As well it resolves syntactical features such as negation and tense. This particular system uses a tag set exceeding the emotion classes. It then uses the attributes to build a scaled model of emotion. Thus, it provides emotion intensity as well as an emotion class. Another study performs classification in an overlapping domain – emotion act tagging in chat data [12]. This study applies Eric Brill’s Transformation-Based Learning (TBL) to the problem of identifying the purpose of messages in chat conversation. Examples of posting acts are: statement, yes-no-question, and emotion. Emotion in this case represents a strong expression of any emotion. This method uses a set of templates and contextual information to identify emotions. Examples are the use of emoticons and other such expressive measures. TBL uses an iterative error-driven contextual approach to classify the instances using provided templates, which in this case are chat messages. This study addresses the problem of malformed grammar and words through the use of regular expressions. 3. OUR APPROACH In contrast to the previous methods detailed above our approach employed a method that used very simple and resistant properties of the data. Thus, it is not necessary to make some of the assumptions made by more complex methods such as those of semantic analysis and symbol processing. It attempts to reconstruct the spoken language represented by the chat messages and leverage that information to understand the properties of the language itself. This can be considered closer to a Non-verbal approach. Furthermore, our approach is local to the message being processed and efficient. A message can be translated into its phonemic equivalent and processed by the model very quickly, which allows real time emotion classification. This is an improvement over models that require intense analysis and model generation. Furthermore, due to the local nature of the method messages can be processed out of order and independent