Non-linguistic vocalisation recognition based on hybrid GMM-SVM approach

This paper describes an algorithm for detection of nonlinguistic vocalisations, such as laughter or fillers, based on acoustic features. The algorithm proposed combines the benefits of Gaussian mixture models (GMM) and the advantages of support vector machines (SVMs). Three GMMs were trained for garbage, laughter, and fillers, and then an SVM model was trained in the GMM score space. Various experiments were run to tune the parameters of the proposed algorithm, using the data sets originating from the SSPNet Vocalisation Corpus (SVC) provided for the Social Signals Sub-Challenge of the INTERSPEECH 2013 Computational Paralinguistics Challenge. The results showed a remarkable growth of the unweighted average of the area under the receiver operating curve (UAAUC) compared to the baseline results (from 87.6% to over 94% for the development set), which confirmed the efficiency of the proposed method. Index Terms: paralingustics, social signals, laughter detection, filler, support vector machines, Gaussian mixture models, cepstrum

[1]  Artur Janicki,et al.  Speaker Recognition from Coded Speech Using Support Vector Machines , 2011, TSD.

[2]  Laurence Devillers,et al.  Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs , 2006, INTERSPEECH.

[3]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[4]  J. Rajnoha Speaker Non-speech Event Recognition with Standard Speech Datasets , 2007 .

[5]  Yu-Kai Lin,et al.  Classification of non-speech human sounds: Feature selection and snoring sound analysis , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[6]  Nick Campbell,et al.  Perception of affect in speech - towards an automatic processing of paralinguistic information in spoken conversation , 2004, INTERSPEECH.

[7]  Dima Ruinskiy,et al.  An Effective Algorithm for Automatic Detection and Exact Demarcation of Breath Sounds in Speech and Song Signals , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Haizhou Li,et al.  GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[11]  David A. van Leeuwen,et al.  Automatic detection of laughter , 2005, INTERSPEECH.

[12]  Björn W. Schuller,et al.  Discrimination of Linguistic and Non-Linguistic Vocalizations in Spontaneous Speech: Intra- and Inter-Corpus Perspectives , 2012, INTERSPEECH.

[13]  Nick Campbell,et al.  No laughing matter , 2005, INTERSPEECH.

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  Jürgen Trouvain,et al.  On the acoustics of overlapping laughter in conversational speech , 2012, INTERSPEECH.

[16]  Mark J. F. Gales,et al.  SVMS, SCORE-SPACES AND MAXIMUM MARGIN STATISTICAL MODELS , 2004 .

[17]  Artur Janicki On the Impact of Non-speech Sounds on Speaker Recognition , 2012, TSD.

[18]  Richard B. Reilly,et al.  Automatic breath sound detection and removal for cognitive studies of speech and language , 2009 .