Detecting Filled Pauses and Lengthenings in Russian Spontaneous Speech Using SVM

Spontaneous speech differs from any other type of speech in many ways. And the presence of speech disfluencies is its prominent characteristic. These phenomena are important feature in human-human communication and at the same time a challenging obstacle for the speech processing tasks. This paper reports the experiment results on automatic detection of filled pauses and sound lengthenings basing on the automatically extracted acoustic features. We have performed machine learning experiments using support vector machine (SVM) classifier on the mixed and quality diverse corpus of Russian spontaneous speech. We applied Gaussian filtering and morphological opening to post-process the probability estimates from an SVM classifier. As the result we achieved F1–score of 0.54, with precision and recall being 0.55 and 0.53 respectively.

[1]  D. O’connell,et al.  The History of Research on the Filled Pause as Evidence of The Written Language Bias in Linguistics (Linell, 1982) , 2004, Journal of psycholinguistic research.

[2]  Helena Moniz,et al.  Disfluency detection based on prosodic features for university lectures , 2013, INTERSPEECH.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Henk J. A. M. Heijmans,et al.  Mathematical Morphology: A Modern Approach in Image Processing Based on Algebra and Geometry , 1995, SIAM Rev..

[5]  Ingo Siegert,et al.  Application of image processing methods to filled pauses detection from spontaneous speech , 2014, INTERSPEECH.

[6]  Elizabeth Shriberg To ‘errrr’ is human: ecology and acoustics of speech disfluencies , 2001, Journal of the International Phonetic Association.

[7]  Andreas Stolcke,et al.  A prosody only decision-tree model for disfluency detection , 1997, EUROSPEECH.

[8]  Vasilisa Verkhodanova,et al.  Multi-factor Method for Detection of Filled Pauses and Lengthenings in Russian Spontaneous Speech , 2015, SPECOM.

[9]  Masataka Goto,et al.  A real-time filled pause detection system for spontaneous speech recognition , 1999, EUROSPEECH.

[10]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[11]  Rahul Gupta,et al.  Paralinguistic event detection from speech using probabilistic time-series smoothing and masking , 2013, INTERSPEECH.

[12]  Jiang Wu,et al.  Open Source Multi-Language Audio Database for Spoken Language Processing Applications , 2012, INTERSPEECH.

[13]  Vasilisa Verkhodanova,et al.  Automatic Detection of Filled Pauses and Lengthenings in the Spontaneous Russian Speech , 2014 .

[14]  Svetlana Stepanova SOME FEATURES OF FILLED HESITATION PAUSES IN SPONTANEOUS RUSSIAN , 2007 .

[15]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Andreas Stolcke,et al.  Human language technology: opportunities and challenges , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[17]  Richard Ogden Turn transition, creak and glottal stop in Finnish talk-in-interaction , 2001, Journal of the International Phonetic Association.

[18]  Jean-Pierre Martens,et al.  A feature-based filled pause detection system for Dutch , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[19]  Elizabeth Shriberg,et al.  Spontaneous speech: how people really talk and why engineers should care , 2005, INTERSPEECH.

[20]  Gökhan Tür,et al.  Automatic detection of sentence boundaries and disfluencies based on recognized words , 1998, ICSLP.

[21]  D. O'Shaughnessy,et al.  Recognition of hesitations in spontaneous speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.