Dialog speech sentiment classification for imbalanced datasets

Speech is the most common way humans express their feelings, and sentiment analysis is the use of tools such as natural language processing and computational algorithms to identify the polarity of these feelings. Even though this field has seen tremendous advancements in the last two decades, the task of effectively detecting under represented sentiments in different kinds of datasets is still a challenging task. In this paper, we use single and bi-modal analysis of short dialog utterances and gain insights on the main factors that aid in sentiment detection, particularly in the underrepresented classes, in datasets with and without inherent sentiment component. Furthermore, we propose an architecture which uses a learning rate scheduler and different monitoring criteria and provides state-of-the-art results for the SWITCHBOARD imbalanced sentiment dataset.

[1]  Liangliang Cao,et al.  A Large Scale Speech Sentiment Corpus , 2020, LREC.

[2]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[3]  Emery Schubert,et al.  Emotion appraisal dimensions inferred from vocal expressions are consistent across cultures: a comparison between Australia and India , 2017, Royal Society Open Science.

[4]  David Ardia,et al.  Econometrics Meets Sentiment: An Overview of Methodology and Applications , 2019, SSRN Electronic Journal.

[5]  Runnan Li,et al.  Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Kun Guo,et al.  Survey on Classic and Latest Textual Sentiment Analysis Articles and Techniques , 2019, Int. J. Inf. Technol. Decis. Mak..

[10]  Doaa Mohey El Din Mohamed Hussein,et al.  A survey on sentiment analysis challenges , 2016, Journal of King Saud University - Engineering Sciences.

[11]  Erik Cambria,et al.  Multimodal Sentiment Analysis: Addressing Key Issues and Setting Up the Baselines , 2018, IEEE Intelligent Systems.

[12]  Petri Laukka,et al.  Cross-cultural decoding of positive and negative non-linguistic emotion vocalizations , 2013, Front. Psychol..

[13]  Shahid Shayaa,et al.  Sentiment Analysis of Big Data: Methods, Applications, and Open Challenges , 2018, IEEE Access.

[14]  Jong Won Shin,et al.  DNN-based Emotion Recognition Based on Bottleneck Acoustic Features and Lexical Features , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Najim Dehak,et al.  Deep Neural Networks for Emotion Recognition Combining Audio and Transcripts , 2018, INTERSPEECH.

[16]  Yu Zhang,et al.  Speech Sentiment Analysis via Pre-Trained Features from End-to-End ASR Models , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[18]  Ivan Marsic,et al.  Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment , 2018, ACL.

[19]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[20]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .