Emotional sounds of crowds: spectrogram-based analysis using deep learning

Crowds express emotions as a collective individual, which is evident from the sounds that a crowd produces in particular events, e.g., collective booing, laughing or cheering in sports matches, movies, theaters, concerts, political demonstrations, and riots. A critical question concerning the innovative concept of crowd emotions is whether the emotional content of crowd sounds can be characterized by frequency-amplitude features, using analysis techniques similar to those applied on individual voices, where deep learning classification is applied to spectrogram images derived by sound transformations. In this work, we present a technique based on the generation of sound spectrograms from fragments of fixed length, extracted from original audio clips recorded in high-attendance events, where the crowd acts as a collective individual. Transfer learning techniques are used on a convolutional neural network, pre-trained on low-level features using the well-known ImageNet extensive dataset of visual knowledge. The original sound clips are filtered and normalized in amplitude for a correct spectrogram generation, on which we fine-tune the domain-specific features. Experiments held on the finally trained Convolutional Neural Network show promising performances of the proposed model to classify the emotions of the crowd.

[1]  Osvaldo Gervasi,et al.  Automating facial emotion recognition , 2019, Web Intell..

[2]  Osvaldo Gervasi,et al.  An Approach for Improving Automatic Mouth Emotion Recognition , 2019, ICCSA.

[3]  Ao Zhang,et al.  Cross-subject driver status detection from physiological signals based on hybrid feature selection and transfer learning , 2019, Expert Syst. Appl..

[4]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[5]  Valentina Franzoni,et al.  A Deep Learning Semantic Approach to Emotion Recognition Using the IBM Watson Bluemix Alchemy Language , 2017, ICCSA.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Jingcheng Du,et al.  Leveraging machine learning-based approaches to assess human papillomavirus vaccination sentiment trends with Twitter data , 2017, BMC Medical Informatics and Decision Making.

[8]  Seyedmahdad Mirsamadi,et al.  Automatic speech emotion recognition using recurrent neural networks with local attention , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Yonghwan Kim,et al.  Incivility on Facebook and political polarization: The mediating role of seeking further comments and negative emotion , 2019, Comput. Hum. Behav..

[10]  Margaret Lech,et al.  Towards real-time Speech Emotion Recognition using deep neural networks , 2015, 2015 9th International Conference on Signal Processing and Communication Systems (ICSPCS).

[11]  Sathit Prasomphan,et al.  Detecting human emotion via speech recognition by using speech spectrogram , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[12]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Reda Alhajj,et al.  Emotion and sentiment analysis from Twitter text , 2019, J. Comput. Sci..

[14]  P. Ekman An argument for basic emotions , 1992 .

[15]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[16]  Wen Gao,et al.  Graph-Based Joint Dequantization and Contrast Enhancement of Poorly Lit JPEG Images , 2019, IEEE Transactions on Image Processing.

[17]  Michael Skinner,et al.  Real time speech emotion recognition using RGB image classification and transfer learning , 2017, 2017 11th International Conference on Signal Processing and Communication Systems (ICSPCS).

[18]  Salma Elgayar,et al.  Emotion Detection from Text: Survey , 2017 .

[19]  E. B. Newman,et al.  A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .

[20]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[21]  Valentina Franzoni,et al.  Crowd emotional sounds: spectrogram-based analysis using convolutional neural network , 2019, SAT@SMC.

[22]  Li Chen,et al.  Emotional States Associated with Music , 2015, ACM Trans. Interact. Intell. Syst..

[23]  Michael Skinner,et al.  Amplitude-Frequency Analysis of Emotional Speech Using Transfer Learning and Classification of Spectrogram Images , 2018, Advances in Science, Technology and Engineering Systems Journal.

[24]  Harsh Namdev Bhor,et al.  Digital media marketing using trend analysis on social media , 2018, 2018 2nd International Conference on Inventive Systems and Control (ICISC).

[25]  P. Ekman Are there basic emotions? , 1992, Psychological review.

[26]  Margaret Lech,et al.  Evaluating deep learning architectures for Speech Emotion Recognition , 2017, Neural Networks.

[27]  Andrea Bonarini,et al.  Can my robotic home cleaner be happy? Issues about emotional expression in non-bio-inspired robots , 2016, Adapt. Behav..

[28]  Colin J. Neill,et al.  Twitter Data for Predicting Election Results: Insights from Emotion Classification , 2019, IEEE Technology and Society Magazine.

[29]  Yongzhao Zhan,et al.  Speech Emotion Recognition Using CNN , 2014, ACM Multimedia.

[30]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[31]  Osvaldo Gervasi,et al.  EmEx, a Tool for Automated Emotive Face Recognition Using Convolutional Neural Networks , 2017, ICCSA.

[32]  YadollahiAli,et al.  Current State of Text Sentiment Analysis from Opinion to Emotion Mining , 2017 .

[33]  Shira Dvir-Gvirsman,et al.  Media audience homophily: Partisan websites, audience identity and polarization processes , 2017, New Media Soc..

[34]  Valentina Franzoni,et al.  A Preliminary Work on Dog Emotion Recognition , 2019, WI.