论文信息 - External Attention LSTM Models for Cognitive Load Classification from Speech

External Attention LSTM Models for Cognitive Load Classification from Speech

Cognitive Load (CL) refers to the amount of mental demand that a given task imposes on an individual’s cognitive system and it can affect his/her productivity in very high load situations. In this paper, we propose an automatic system capable of classifying the CL level of a speaker by analyzing his/her voice. We focus on the use of Long Short-Term Memory (LSTM) networks with different weighted pooling strategies, such as mean-pooling, max-pooling, last-pooling and a logistic regression attention model. In addition, as an alternative to the previous methods, we propose a novel attention mechanism, called external attention model, that uses external cues, such as log-energy and fundamental frequency, for weighting the contribution of each LSTM temporal frame, overcoming the need of a large amount of data for training the attentional model. Experiments show that the LSTM-based system with external attention model outperforms significantly the baseline system based on Support Vector Machines (SVM) and the LSTM-based systems with the conventional weighed pooling schemes and with the logistic regression attention model.

Juan Manuel Montero-Martínez | Ascensión Gallardo-Antolín | J. Montero-Martínez | A. Gallardo-Antolín

[1] Erik Marchi,et al. Real-time robust recognition of speakers' emotions and characteristics on mobile platforms , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[2] Fabien Ringeval,et al. The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load , 2014, INTERSPEECH.

[3] J. Stroop. Studies of interference in serial verbal reactions. , 1992 .

[4] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[5] Jimmy Ludeña-Choez,et al. Feature extraction based on the high-pass filtering of audio signals for Acoustic Event Classification , 2015, Comput. Speech Lang..

[6] François Chollet,et al. Keras: The Python Deep Learning library , 2018 .

[7] Seyedmahdad Mirsamadi,et al. Automatic speech emotion recognition using recurrent neural networks with local attention , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Fuchun Peng,et al. Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Jürgen Schmidhuber,et al. Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[10] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[11] Christian A. Müller,et al. Recognizing Time Pressure and Cognitive Load on the Basis of Speech: An Experimental Study , 2001, User Modeling.

[12] John H. L. Hansen,et al. Analysis and detection of cognitive load and frustration in drivers' speech , 2010, INTERSPEECH.

[13] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[14] Vidhyasaharan Sethu,et al. The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge , 2014, INTERSPEECH.

[15] J. Gonzalez-Dominguez,et al. Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks , 2016, PloS one.

[16] D B Pisoni,et al. Effects of cognitive workload on speech production: acoustic analyses and perceptual consequences. , 1993, The Journal of the Acoustical Society of America.

[17] Jimmy Ludeña-Choez,et al. Acoustic Event Classification using spectral band selection and Non-Negative Matrix Factorization-based features , 2016, Expert Syst. Appl..

[18] Björn W. Schuller,et al. Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[19] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.

[20] Eero Väyrynen,et al. Effect of cognitive load on speech prosody in aviation: Evidence from military simulator flights. , 2011, Applied ergonomics.

[21] Che-Wei Huang,et al. Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[22] Yanmin Qian,et al. Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23] Che-Wei Huang,et al. Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion Recognition , 2016, INTERSPEECH.

[24] Shrikanth S. Narayanan,et al. Classification of cognitive load from speech using an i-vector framework , 2014, INTERSPEECH.