Multiresolution analysis applied to text-independent phone segmentation

Automatic speech segmentation is of fundamental importance in different speech applications. The most common implementations are based on hidden Markov models. They use a statistical modelling of the phonetic units to align the data along a known transcription. This is an expensive and time-consuming process, because of the huge amount of data needed to train the system. Text-independent speech segmentation procedures have been developed to overcome some of these problems. These methods detect transitions in the evolution of the time-varying features that represent the speech signal. Speech representation plays a central role is the segmentation task. In this work, two new speech parameterizations based on the continuous multiresolution entropy, using Shannon entropy, and the continuous multiresolution divergence, using Kullback-Leibler distance, are proposed. These approaches have been compared with the classical Melbank parameterization. The proposed encodings increase significantly the segmentation performance. Parameterization based on the continuous multiresolution divergence shows the best results, increasing the number of correctly detected boundaries and decreasing the amount of erroneously inserted points. This suggests that the parameterization based on multiresolution information measures provide information related to acoustic features that take into account phonemic transitions.

[1]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Manish Sharma,et al.  "Blind" speech segmentation: automatic segmentation of speech without linguistic knowledge , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Patrice Abry,et al.  Multiresolution entropy measure , 1997, Optics & Photonics.

[5]  María José Castro Bleda,et al.  Automatic Segmentation of Speech at the Phonetic Level , 2002, SSPR/SPR.

[6]  Luis A. Hernández Gómez,et al.  Automatic phonetic segmentation , 2003, IEEE Trans. Speech Audio Process..

[7]  Catia Cucchiarini,et al.  How to Improve Human and Machine Transcriptions of Spontaneous Speech , 2003 .

[8]  Fathi M. Salem,et al.  An entropy based robust speech boundary detection algorithm for realistic noisy environments , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[9]  Diego H. Milone,et al.  Introducing complexity measures in nonlinear physiological signals: application to robust speech recognition , 2004 .

[10]  Anna Esposito,et al.  Text Independent Methods for Speech Segmentation , 2004, Summer School on Neural Networks.

[11]  Bing-Fei Wu,et al.  Noise Spectrum Estimation with Entropy-Based VAD in Non-stationary Environments , 2006, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..