Auditory-Inspired End-to-End Speech Emotion Recognition Using 3D Convolutional Recurrent Neural Networks Based on Spectral-Temporal Representation
暂无分享,去创建一个
Jianwu Dang | Masashi Unoki | Zhi Zhu | Zhichao Peng | Masato Akagi | J. Dang | M. Unoki | M. Akagi | Zhi Zhu | Zhichao Peng
[1] N Suga,et al. Analysis of information-bearing elements in complex sounds by auditory neurons of bats. , 1972, Audiology : official organ of the International Society of Audiology.
[2] Louis-Philippe Morency,et al. Learning Representations of Affect from Speech , 2015, ICLR 2015.
[3] K. Sen,et al. Spectral-temporal Receptive Fields of Nonlinear Auditory Neurons Obtained Using Natural Sounds , 2022 .
[4] Roy D. Patterson,et al. Improvement of an IIR asymmetric compensation gammachirp filter , 2001 .
[5] Richard F. Lyon,et al. A computational model of filtering, detection, and compression in the cochlea , 1982, ICASSP.
[6] George Trigeorgis,et al. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Birger Kollmeier,et al. An Auditory Inspired Amplitude Modulation Filter Bank for Robust Feature Extraction in Automatic Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[8] Yongzhao Zhan,et al. Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.
[9] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
[10] A R Moller. Unit responses in the rat cochlear nucleus to tones of rapidly varying frequency and amplitude. , 1971, Acta physiologica Scandinavica.
[11] Roy D. Patterson,et al. A Dynamic Compressive Gammachirp Auditory Filterbank , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[12] Tiago H. Falk,et al. Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..
[13] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[14] Ngoc Thang Vu,et al. Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech , 2017, INTERSPEECH.
[15] Eero P. Simoncelli,et al. Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .
[16] Shashidhar G. Koolagudi,et al. Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.
[17] Wootaek Lim,et al. Speech emotion recognition using convolutional and Recurrent Neural Networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[18] Masashi Unoki,et al. Modulation Spectral Features for Predicting Vocal Emotion Recognition by Simulated Cochlear Implants , 2016, INTERSPEECH.
[19] Dong Yu,et al. Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.
[20] B. Kollmeier,et al. Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.
[21] Björn W. Schuller,et al. Convolutional RNN: An enhanced model for extracting features from sequential data , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).
[22] Powen Ru,et al. Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.
[23] Grigoriy Sterling,et al. Emotion Recognition From Speech With Recurrent Neural Networks , 2017, ArXiv.
[24] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.
[25] R. Patterson,et al. Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.