Apprentissage profond appliqué à la reconnaissance des émotions dans la voix. (Deep learning applied to speech emotion recognition)
暂无分享,去创建一个
[1] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[2] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..
[3] Louis-Philippe Morency,et al. Representation Learning for Speech Emotion Recognition , 2016, INTERSPEECH.
[4] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..
[5] Kjell Elenius,et al. Emotion Recognition , 2009, Computers in the Human Interaction Loop.
[6] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.
[7] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] W. Cannon. The James-Lange theory of emotions: a critical examination and an alternative theory. By Walter B. Cannon, 1927. , 1927, American Journal of Psychology.
[9] Ngoc Thang Vu,et al. Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech , 2017, INTERSPEECH.
[10] Björn W. Schuller,et al. Deep neural networks for acoustic emotion recognition: Raising the benchmarks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Mohamed Chtourou,et al. On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.
[12] Laurence Devillers,et al. Designing an Emotion Detection System for a Socially Intelligent Human-Robot Interaction , 2012, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.
[13] Tao Wang,et al. End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).
[14] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[15] Carlos Busso,et al. Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..
[16] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[17] Björn W. Schuller,et al. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.
[18] Sander Bohte,et al. Conditional Time Series Forecasting with Convolutional Neural Networks , 2017, 1703.04691.
[19] William J. Christmas,et al. When Face Recognition Meets with Deep Learning: An Evaluation of Convolutional Neural Networks for Face Recognition , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).
[20] Lori Lamel,et al. Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.
[21] Jinyu Li,et al. Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks. , 2013, ICLR 2013.
[22] Georges Linarès,et al. La parole spontanée : transcription et traitement , 2008 .
[23] G. Mounin. Dictionnaire de la linguistique , 1995 .
[24] Kim Gerdes,et al. Actes d ’ IDP 09 191 Prosodic hierarchy and spectral realization of vowels in French Cédric , 2010 .
[25] Amit Agarwal,et al. CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.
[26] Carlos Busso,et al. MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception , 2017, IEEE Transactions on Affective Computing.
[27] Che-Wei Huang,et al. Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion Recognition , 2016, INTERSPEECH.
[28] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[29] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[30] Björn W. Schuller,et al. CINEMO - A French Spoken Language Resource for Complex Emotions: Facts and Baselines , 2010, LREC.
[31] C. Darwin. The Expression of the Emotions in Man and Animals , .
[32] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[33] Laurence Vidrascu,et al. Analyse et détection des émotions verbales dans les interactions orales. (Analysis and detection of emotions in real-life spontaneous speech) , 2007 .
[34] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.
[37] Louis ten Bosch,et al. Information Encoding by Deep Neural Networks: What Can We Learn? , 2018, INTERSPEECH.
[38] Efthymios Tzinis,et al. Segment-based speech emotion recognition using recurrent neural networks , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).
[39] J. Russell,et al. Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. , 1999, Journal of personality and social psychology.
[40] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[41] Laurence Devillers,et al. Protocol CINEMO: The use of fiction for collecting emotional data in naturalistic controlled oriented context , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.
[42] Luc Mioulet,et al. Reconnaissance de l'écriture manuscrite avec des réseaux récurrents. (Recurent neural network for handwriting recognition) , 2015 .
[43] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[44] Dong Yu,et al. Efficient and effective algorithms for training single-hidden-layer neural networks , 2012, Pattern Recognit. Lett..
[45] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[46] N. Martaj,et al. Réseaux de neurones , 2010 .
[47] Laurence Devillers,et al. CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation , 2018, Workshop on Speech, Music and Mind (SMM 2018).
[48] Gerald Penn,et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Marie Tahon,et al. Inference of Human Beings’ Emotional States from Speech in Human–Robot Interactions , 2015, Int. J. Soc. Robotics.
[50] Geoffrey E. Hinton,et al. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.
[51] J. Dubois. Dictionnaire de linguistique , 1973 .
[52] D. Isaacowitz,et al. Emotion in Cognition , 2015 .
[53] Kornel Laskowski,et al. Combining Efforts for Improving Automatic Classification of Emotional User States , 2006 .
[54] Anne Lacheret,et al. The role of intonation and voice quality in the affective speech perception , 2007, INTERSPEECH.
[55] Xiaodong Cui,et al. Data Augmentation for Deep Neural Network Acoustic Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[56] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[57] J. Russell. A circumplex model of affect. , 1980 .
[58] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .
[59] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[60] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[61] George N. Votsis,et al. Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..
[62] Dong Yu,et al. Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.
[63] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[64] George Trigeorgis,et al. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[65] Björn W. Schuller,et al. Patterns, prototypes, performance: classifying emotional user states , 2008, INTERSPEECH.
[66] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Grigoriy Sterling,et al. Emotion Recognition From Speech With Recurrent Neural Networks , 2017, ArXiv.
[68] Ron Hoory,et al. Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms , 2017, INTERSPEECH.
[69] Björn W. Schuller,et al. An Image-based Deep Spectrum Feature Representation for the Recognition of Emotional Speech , 2017, ACM Multimedia.
[70] Vinay Kumar Mittal,et al. Emotion recognition from speech signal , 2017, TENCON 2017 - 2017 IEEE Region 10 Conference.
[71] Björn Schuller,et al. The Automatic Recognition of Emotions in Speech , 2011 .
[72] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[73] Sean A. Spence,et al. Descartes' Error: Emotion, Reason and the Human Brain , 1995 .
[74] Emily Mower Provost,et al. Progressive Neural Networks for Transfer Learning in Emotion Recognition , 2017, INTERSPEECH.
[75] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[76] Razvan Pascanu,et al. Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[77] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[78] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[79] Carlos Busso,et al. Interpreting ambiguous emotional expressions , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.
[80] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[81] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[82] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[83] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[84] Rohit Kumar,et al. Emotion Recognition using Acoustic and Lexical Features , 2012, INTERSPEECH.
[85] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[86] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[87] Patrice Y. Simard,et al. High Performance Convolutional Neural Networks for Document Processing , 2006 .
[88] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[89] Laurence Devillers,et al. Représentation et détection des émotions dans des dialogues enregistrés dans un centre d'appel. Des émotions complexes dans des données réelles , 2006, Rev. d'Intelligence Artif..
[90] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[91] Charu C. Aggarwal,et al. Neural Networks and Deep Learning , 2018, Springer International Publishing.
[92] Nicolas Sturmel,et al. SIGNAL RECONSTRUCTION FROM STFT MAGNITUDE : A STATE OF THE ART , 2011 .
[93] Mark Hasegawa-Johnson,et al. Visualizing Phoneme Category Adaptation in Deep Neural Networks , 2018, INTERSPEECH.
[94] Blockin,et al. Vocal Expression of Emotion , 2004 .
[95] Honglak Lee,et al. Deep learning for robust feature generation in audiovisual emotion recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[96] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.
[97] D. Sander,et al. Théories et concepts contemporains en psychologie de l’émotion , 2010 .
[98] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[99] Kostas Karpouzis,et al. The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data , 2007, ACII.
[100] Chee Kheong Siew,et al. Extreme learning machine: Theory and applications , 2006, Neurocomputing.
[101] Li Lee,et al. A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..
[102] Jinkyu Lee,et al. High-level feature representation using recurrent neural network for speech emotion recognition , 2015, INTERSPEECH.
[103] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.
[104] Ramón López-Cózar,et al. On the Use of Kappa Coefficients to Measure the Reliability of the Annotation of Non-acted Emotions , 2008, PIT.
[105] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[106] K. Scherer. Vocal affect expression: a review and a model for future research. , 1986, Psychological bulletin.
[107] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[108] M. Tahon,et al. Analyse acoustique de la voix émotionnelle de locuteurs lors d’une interaction humain-robot , 2012 .
[109] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[110] Marie Tahon,et al. Towards a Small Set of Robust Acoustic Features for Emotion Recognition: Challenges , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[111] Anne Lacheret. le corps en voix ou l'expression prosodique des émotions , 2011 .
[112] Daniel Luzzati. Le fenêtrage syntaxique: une méthode d'analyse et d'évaluation de l'oral spontané , 2004 .
[113] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[114] Björn W. Schuller,et al. Emotional Speech of Mentally and Physically Disabled Individuals: Introducing the EmotAsS Database and First Findings , 2017, INTERSPEECH.
[115] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[116] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[117] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[118] Marie Tahon,et al. Détection des états affectifs lors d’interactions parlées : robustesse des indices non verbaux [Automatic in-voice affective state detection in spontaneous speech: robustness of non-verbal cues] , 2014, TAL.
[119] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[120] D. Hubel,et al. Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.
[121] Tanaya Guha,et al. Learning Spontaneity to Improve Emotion Recognition In Speech , 2018, INTERSPEECH.
[122] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[123] Colin Raffel,et al. Lasagne: First release. , 2015 .