Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network
暂无分享,去创建一个
George Trigeorgis | Fabien Ringeval | Erik Marchi | Björn Schuller | Stefanos Zafeiriou | Mihalis A. Nicolaou | Raymond Brueckner | George Trigeorgis | Björn Schuller | S. Zafeiriou | F. Ringeval | E. Marchi | Raymond Brueckner | M. Nicolaou
[1] Geoffrey E. Hinton,et al. Learning a better representation of speech soundwaves using restricted boltzmann machines , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Jean-Philippe Thiran,et al. Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data , 2015, Pattern Recognit. Lett..
[3] Hans-Günter Hirsch,et al. Improved speech recognition using high-pass filtering of subband envelopes , 1991, EUROSPEECH.
[4] Klaus R. Scherer,et al. Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..
[5] Fabien Ringeval,et al. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
[6] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[7] L. Lin,et al. A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.
[8] Fabien Ringeval,et al. AV+EC 2015: The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data , 2015, AVEC@ACM Multimedia.
[9] Hermann Ney,et al. Gammatone Features and Feature Combination for Large Vocabulary Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[11] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[12] Simon Dixon,et al. An End-to-End Neural Network for Polyphonic Music Transcription , 2015, ArXiv.
[13] Tara N. Sainath,et al. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Richard Rose,et al. Architectures for deep neural network based acoustic models defined over windowed speech waveforms , 2015, INTERSPEECH.
[15] Tara N. Sainath,et al. Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.
[16] Yongzhao Zhan,et al. Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.
[17] Dimitri Palaz,et al. Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks , 2013, INTERSPEECH.
[18] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[19] Benjamin Schrauwen,et al. End-to-end learning for music audio , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Dimitri Palaz,et al. Analysis of CNN-based speech recognition system using raw speech as input , 2015, INTERSPEECH.
[21] Simon Dixon,et al. An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[22] Fabio Valente,et al. The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.
[23] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[24] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[25] Carlos Busso,et al. Correcting Time-Continuous Emotional Labels by Modeling the Reaction Lag of Evaluators , 2015, IEEE Transactions on Affective Computing.
[26] Patrick Thiam,et al. Ensemble Methods for Continuous Affect Recognition: Multi-modality, Temporality, and Challenges , 2015, AVEC@ACM Multimedia.
[27] Björn W. Schuller,et al. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.
[28] Christian Biemann,et al. Using representation learning and out-of-domain data for a paralinguistic speech task , 2015, INTERSPEECH.
[29] C. Nickerson. A note on a concordance correlation coefficient to evaluate reproducibility , 1997 .
[30] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.