论文信息 - Discrimination of speech and non-linguistic vocalizations by Non-Negative Matrix Factorization

Discrimination of speech and non-linguistic vocalizations by Non-Negative Matrix Factorization

We introduce features based on Non-Negative Matrix Factorization (NMF) for discrimination of speech and non-linguistic vocalizations such as laughter or breathing, which is a crucial task in recognition of spontaneous speech. NMF has been successfully used in speech-related tasks such as de-noising and speaker separation. While existing approaches use it as a preprocessing step for conventional speech recognizers, we aim at directly classifying the output of the NMF algorithm. To this end, we propose a feature extraction procedure based on a supervised variant of NMF, considering two different algorithms. Applying our approach to a spontaneous speech corpus, we show that addition of NMF features to an MFCC-based classifier increases mean recall of speech and non-linguistic vocalizations by over 2.5% absolute, and particularly recall of laughter by 6.6% absolute. The improvement is significant at a level of 0.4 %.

Björn W. Schuller | Felix Weninger | Björn Schuller | F. Weninger

[1] Masataka Goto,et al. A real-time filled pause detection system for spontaneous speech recognition , 1999, EUROSPEECH.

[2] Paris Smaragdis,et al. Mitsubishi Electric Research Laboratories , 1994 .

[3] Loïc Kessous,et al. The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals , 2007, INTERSPEECH.

[4] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Tuomas Virtanen,et al. Spectral covariance in prior distributions of non-negative matrix factorization based speech separation , 2009, 2009 17th European Signal Processing Conference.

[6] Ali Taylan Cemgil,et al. Mixtures of Gamma Priors for Non-negative Matrix Factorization Based Speech Separation , 2009, ICA.

[7] Seungjin Choi,et al. Non-negative component parts of sound for classification , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[8] John R. Hershey,et al. Efficient model-based speech separation and denoising using non-negative subspace analysis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] Björn W. Schuller,et al. Static and Dynamic Modelling for the Recognition of Non-verbal Vocalisations in Conversational Speech , 2008, PIT.

[10] David A. van Leeuwen,et al. Automatic detection of laughter , 2005, INTERSPEECH.

[11] Bhiksha Raj,et al. Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12] Björn Schuller,et al. Being bored? Recognising natural interest by extensive audiovisual integration for real-life application , 2009, Image Vis. Comput..

[13] Nikki Mirghafori,et al. Automatic laughter detection using neural networks , 2007, INTERSPEECH.

[14] Nick Campbell,et al. On the Use of NonVerbal Speech Sounds in Human Communication , 2007, COST 2102 Workshop.

[15] Bhiksha Raj,et al. Regularized non-negative matrix factorization with temporal dependencies for speech denoising , 2008, INTERSPEECH.

[16] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[17] Mikkel N. Schmidt,et al. Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.