Fractal dimension pattern-based multiresolution analysis for rough estimator of speaker-dependent audio emotion recognition

As a general means of expression, audio analysis and recognition has attracted much attentions for its wide applications in real-life world. Audio emotion recognition (AER) attempts to understand emotional states of human with the given utterance signals, and has been studied abroad for its further development on friendly human-machine interfaces. Distinguish from other existing works, the person-dependent patterns of audio emotions are conducted, and fractal dimension features are calculated for acoustic feature extraction. Furthermore, it is able to efficiently learn intrinsic characteristics of auditory emotions, while the utterance features are learned from fractal dimensions of each sub-bands. Experimental results show the proposed method is able to provide comparative performance for audio emotion recognition.

[1]  Kah Phooi Seng,et al.  A new approach of audio emotion recognition , 2014, Expert Syst. Appl..

[2]  Yuan Yan Tang,et al.  A Fractal Dimension and Wavelet Transform Based Method for Protein Sequence Similarity Analysis , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Kenneth Falconer,et al.  Fractal Geometry: Mathematical Foundations and Applications , 1990 .

[4]  John-Dylan Haynes,et al.  Multi-scale classification of disease using structural MRI and wavelet transform , 2012, NeuroImage.

[5]  Gerhard Rigoll,et al.  Bimodal fusion of emotional data in an automotive environment , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Zhihong Zeng,et al.  Bimodal HCI-related affect recognition , 2004, ICMI '04.

[7]  F. Gosselin,et al.  The Montreal Affective Voices: A validated set of nonverbal affect bursts for research on auditory affective processing , 2008, Behavior research methods.

[8]  Francis J. Narcowich,et al.  A First Course in Wavelets with Fourier Analysis , 2001 .

[9]  Abdel-Ouahab Boudraa,et al.  Audio Watermarking Via EMD , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Yuan Yan Tang,et al.  Nonparametric Feature Extraction via Direct Maximum Margin Alignment , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[11]  Alain Pruski,et al.  Emotion recognition for human-machine communication , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .

[13]  Michael Elad,et al.  Dictionaries for Sparse Representation Modeling , 2010, Proceedings of the IEEE.

[14]  Pan Du,et al.  Bioinformatics Original Paper Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet Transform-based Pattern Matching , 2022 .

[15]  Yuan Yan Tang,et al.  Thinning Character Using Modulus Minima of Wavelet Transform , 2006, Int. J. Pattern Recognit. Artif. Intell..

[16]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[17]  T. Higuchi Approach to an irregular time series on the basis of the fractal theory , 1988 .

[18]  F. Hausdorff Dimension und äußeres Maß , 1918 .

[19]  Zeeshan Syed,et al.  Audio-Visual Tools for Computer-Assisted Diagnosis of Cardiac Disorders , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[20]  Tomio Watanabe The adaptation of machine conversational speed to speaker utterance speed in human-machine communication , 1990, IEEE Trans. Syst. Man Cybern..

[21]  M. J. Katz,et al.  Fractals and the analysis of waveforms. , 1988, Computers in biology and medicine.

[22]  Kun-Ching Wang Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition , 2015, Sensors.