Sad state analysis of speech signals using different clustering algorithm

The objective of this paper is to analyse the sad state of speech emotion using voice quality features. This will help the family members, relatives, well-wishers and medical practitioners for timely action to the needy person before onset of deep depression that may danger his/her life. Fuzzy C-means and K-means clustering algorithm have been used to put a boundary between sad speech state against the neutral utterances using voice quality features such as jitter, shimmer, noise to harmonic ratio (NHR) and harmonic to noise ratio (HNR). Shimmer has shown highest accuracy among all these features for sad state followed by jitter as the result suggest. However, for neutral utterances the accuracy of HNR features is best among all followed by shimmer.

[1]  L. Devillers,et al.  ABNORMAL SITUATIONS DETECTION AND ANALYSIS THROUGH FEAR-TYPE ACOUSTIC MANIFESTATIONS , 2007 .

[2]  Claudio Storck,et al.  Reliable jitter and shimmer measurements in voice clinics: the relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. , 2011, Journal of voice : official journal of the Voice Foundation.

[3]  Theodoros Iliou,et al.  Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011 , 2012, Artificial Intelligence Review.

[4]  N. Ellouze,et al.  COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM : APPLICATION TO SPEECH EMOTION RECOGNITION , 2015 .

[5]  Marie Tahon,et al.  Towards a Small Set of Robust Acoustic Features for Emotion Recognition: Challenges , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Eric G. Hansen,et al.  Analysis of mrate, shimmer, jitter, and F/sub 0/ contour features across stress and speaking style in the SUSAS database , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  James C. Bezdek,et al.  Efficient Implementation of the Fuzzy c-Means Clustering Algorithms , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[9]  M. Zbancioc,et al.  A Study about the Statistical Parameters Used in the Emotion Recognition , 2022 .

[10]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[11]  Harishchandra Dubey,et al.  BigEAR: Inferring the Ambient and Emotional Correlates from Smartphone-Based Acoustic Big Data , 2016, 2016 IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE).

[12]  J. Montero,et al.  ANALYSIS AND MODELLING OF EMOTIONAL SPEECH IN SPANISH , 1999 .

[13]  Gilles Degottex,et al.  Usual voice quality features and glottal features for emotional valence detection , 2012 .

[14]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[15]  Chloé Clavel,et al.  Detection and Analysis of Abnormal Situations Through Fear-Type Acoustic Manifestations , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  H. Palo,et al.  Modified-VQ Features for Speech Emotion Recognition , 2016 .

[17]  Mihir Narayan Mohanty,et al.  Design of FIS-Based Model for Emotional Speech Recognition , 2016 .

[18]  P. Jackson,et al.  Multimodal Emotion Recognition , 2010 .