Can prosody inform sentiment analysis? Experiments on short spoken reviews

While most online content is created using textual interfaces, recent improvements in speech recognition accuracy allows the creation of content through speech. This technology allows users to share reviews about entities of interest without any delay, using mobile devices. This paper builds on the previous work on textual sentiment analysis to investigate whether information in the speech signal can be used to predict sentiment from short spoken reviews. For this purpose we collected a short spoken reviews from 84 speakers. Results show that models trained on features characterizing the review's pitch significantly outperform a majority class baseline, without textual information. When taking text-based sentiment predictions into account, our results suggest that prosody can alleviate the effect of speech recognition errors on sentiment detection, however a larger dataset is needed to test whether this can be done without harming performance on low word error rates.

[1]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[2]  M. Wester,et al.  Proceedings of ICASSP , 2014 .

[3]  Shrikanth S. Narayanan,et al.  Classifying emotions in human-machine spoken dialogs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[4]  Dilek Z. Hakkani-Tür,et al.  The AT&T WATSON speech recognizer , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[6]  Regina Barzilay,et al.  Multiple Aspect Ranking Using the Good Grief Algorithm , 2007, NAACL.

[7]  G KoolagudiShashidhar,et al.  Emotion Recognition from Speech , 2014 .

[8]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[9]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[10]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[11]  Björn W. Schuller,et al.  Emotion recognition from speech: Putting ASR in the loop , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Joseph Polifroni,et al.  Good grief, i can speak it! preliminary experiments in audio restaurant reviews , 2010, 2010 IEEE Spoken Language Technology Workshop.

[13]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[14]  Steven J. Simske,et al.  Recognition of emotions in interactive voice response systems , 2003, INTERSPEECH.