A language-identification inspired method for spontaneous speech detection

Most of spontaneous speech detection systems relies on dis-fluency analysis or on combination of acoustic and linguistic features. This paper presents a method that considers spontaneous speech as a specific language, which could be identified by using language-recognition methods, such as shifted delta cepstrum parameters, dimensionality reduction by linear dis-criminant analysis and factor-analysis based filtering process. Experiments are conducted on the French EPAC corpus. On a 3 spontaneity-level task, this approach obtains a relative gain of about 22% of identification rates, in comparison to the classical MFCC/GMM technique. Then, we combine these techniques to others previously proposed for spontaneous speech detection. Finally, the proposed system obtains a recognition rate of 65% on high spontaneous speech segments.

[1]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[2]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[3]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[4]  Geneviève Caelen-Haumont Perlocutory Values and Functions of Melisms in Spontaneous Dialogue , 2002 .

[5]  Mari Ostendorf,et al.  Modeling disfluencies in conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Elizabeth Shriberg,et al.  Phonetic Consequences of Speech Disfluency , 1999 .

[7]  Frédéric Béchet,et al.  Local and global models for spontaneous speech segment detection and characterization , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[8]  Georges Linarès,et al.  Spontaneous Speech Characterization and Detection in Large Audio Database , 2009 .

[9]  Thierry Bazillon,et al.  Manual vs Assisted Transcription of Prepared and Spontaneous Speech , 2008, LREC.

[10]  Patrick Paroubek,et al.  A quantitative study of disfluencies in French broadcast interviews , 2005, DiSS.

[11]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Lukás Burget,et al.  Discriminative Training Techniques for Acoustic Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.