Emojilization: An Automated Method For Speech to Emoji-Labeled Text

Speech To Text (STT) plays a significant role in Voice User Interface (VUI). While preserving necessary semantic information in converted text, STT generally captures no or limited emotional information. In this paper, we present an emojilization tool to automatically attach related emojis to the STT-generated texts by analyzing both textual and acoustic features in speech signals. For a given voice message, the tool selects the most representative emoji from 64 most commonly used emojis. We conducted a pilot study with 34 participants. In our study, 159 utterances were labeled with emojis by our tool. The emotion restoration effect was evaluated. The results indicate that the proposed tool effectively compensates for the emotion loss.

[1]  Björn W. Schuller,et al.  Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling , 2010, INTERSPEECH.

[2]  Wootaek Lim,et al.  Speech emotion recognition using convolutional and Recurrent Neural Networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[3]  Horacio Saggion,et al.  How Cosmopolitan Are Emojis?: Exploring Emojis Usage and Meaning over Different Languages with Distributional Semantics , 2016, ACM Multimedia.

[4]  J. Russell A circumplex model of affect. , 1980 .

[5]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[6]  Chen Zhao,et al.  Redefining Natural User Interface , 2018, CHI Extended Abstracts.

[7]  Iyad Rahwan,et al.  Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[8]  Hsi-Pin Ma,et al.  NNIME: The NTHU-NTUA Chinese interactive multimodal emotion corpus , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[9]  HENNING POHL,et al.  Beyond Just Text , 2017, ACM Trans. Comput. Hum. Interact..

[10]  John Woods,et al.  Survey on Chatbot Design Techniques in Speech Conversation Systems , 2015 .

[11]  David R. Flatla,et al.  Oh that's what you meant!: reducing emoji misunderstanding , 2016, MobileHCI Adjunct.

[12]  J. Bachorowski Vocal Expression and Perception of Emotion , 1999 .

[13]  Seyedmahdad Mirsamadi,et al.  Automatic speech emotion recognition using recurrent neural networks with local attention , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Marc Schröder,et al.  Representing Emotions and Related States in Technological Systems , 2011 .

[15]  J. Russell,et al.  Facial and vocal expressions of emotion. , 2003, Annual review of psychology.

[16]  Sarah Sharples,et al.  Voice Interfaces in Everyday Life , 2018, CHI.