A method for automatic segmentation of speech into phones is described. The incoming utterance is split up into more or less stationary parts, and these stationary parts are labelled as phones using the phonetic transcription of the utterance. An implicit segmentation algorithm splits up the utterance into segments on the basis of the degree of similarity between the frequency spectra of neighboring frames. An explicit algorithm does the same, but on the basis of the degree of similarity between the frequency spectra of the frames in the utterance and reference spectra. A combination algorithm compares the two segmentation results and produces the final segmentation. Automatically determined phone boundaries are compared with manually determined ones. The result of a perception test is described. >
[1]
J. T. Hart,et al.
Integrating different levels of intonation analysis
,
1975
.
[2]
N. Sedgwick,et al.
A method for segmenting acoustic patterns, with applications to automatic speech recognition
,
1977
.
[3]
Bag Ben Elsendoorn.
Heading for a diphone speech synthesis system for Dutch
,
1984
.
[4]
Leo Llm Vogten,et al.
Analyse, zuinige codering en resynthese van spraakgeluid
,
1983
.
[5]
Hermann Ney,et al.
Phonetically guided clustering for isolated word recognition
,
1985,
ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[6]
Victor Zue,et al.
A procedure for automatic alignment of phonetic transcriptions with continuous speech
,
1984,
ICASSP.