Automatic Phonetic Segmentation for a Speech Corpus of Hebrew

This paper presents our study on different phonetic segmentation methods based on hidden Markov models evaluated against a Hebrew speech corpus. We investigated methods for fully automatic phonetic segmentation using only the corpus which should be segmented and automatically generated phonetic transcriptions. A new method for phonetic boundary correction based on spectral variation of the speech signal is proposed. The proposed method increased the boundary correctness of the baseline HMM segmentation system from 30.2%, 59.5% and 86.2% of automatic boundary marks with error smaller than 5, 10 and 20 ms respectively, to 52.3%, 76.3% and 90.7%.

[1]  Ki-Seung Lee MLP-based phone boundary refining for a TTS database , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Jordi Adell,et al.  Comparative study of automatic phone segmentation methods for TTS , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[4]  Doroteo Torre Toledano Neural network boundary refining for automatic speech segmentation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Mauro Cettolo,et al.  Evaluation of BIC-based algorithms for audio segmentation , 2005, Comput. Speech Lang..

[6]  Subramanian Sridharan,et al.  Automatic Speech Segmentation with HMM , 2002 .

[7]  M. Sharma,et al.  Automatic speech segmentation using neural tree networks , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.

[8]  Renato De Mori,et al.  Improved connected digit recognition using spectral variation functions , 1992, ICSLP.

[9]  Vlado Delic,et al.  On the use of higher frame rate in the training phase of ASR , 2010 .

[10]  Mary P. Harper,et al.  Using explicit segmentation to improve HMM phone recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Yeon-Jun Kim,et al.  Automatic segmentation combining an HMM-based approach and spectral boundary correction , 2002, INTERSPEECH.

[12]  Luis A. Hernández Gómez,et al.  Automatic phonetic segmentation , 2003, IEEE Trans. Speech Audio Process..

[13]  Daniel Tihelka,et al.  Automatic segmentation for czech concatenative speech synthesis using statistical approach with boundary-specific correction , 2003, INTERSPEECH.

[14]  Jon Ander Gómez,et al.  Improvements on Automatic Speech Segmentation at the Phonetic Level , 2011, CIARP.

[15]  Andrej Ljolje,et al.  Automatic speech segmentation for concatenative inventory selection , 1994, SSW.

[16]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[17]  Anna Esposito,et al.  Text Independent Methods for Speech Segmentation , 2004, Summer School on Neural Networks.

[18]  Beat Pfister,et al.  Fully automatic segmentation for prosodic speech corpora , 2010, INTERSPEECH.

[19]  Constantine Kotropoulos,et al.  Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  D. Pekar,et al.  Speech Technologies for Serbian and Kindred South Slavic Languages , 2010 .

[21]  Paul Dalsgaard,et al.  Segment based variable frame rate speech analysis and recognition using a spectral variation function , 1992, ICSLP.

[22]  Zhigang Cao,et al.  Refining segmental boundaries for TTS database using fine contextual-dependent boundary models , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.