A system for automatic alignment of phonetic transcriptions with continuous speech has been developed. The speech signal is first segmented into broad classes using a non-parametric Pattern classifier. A knowledge-based dynamic programming algorithm then aligns the broad classes with the phonetic transcriptions. These broad classes provide "islands of reliability" for more detailed segmentation and refinement of boundaries. By doing alignment at the phonetic level, the system can often tolerate inter and intra-speaker variability. The system was evaluated on sixty sentences spoken by three speakers, two male and one female. 93% of the segments are mapped into only one phoneme, 70% of the time the offset between the boundary found by the automatic alignment system and a hand transcriber is less than 10 ms. The performance can be improved by applying more heuristic rules.
[1]
Michael Wagner.
Automatic labelling of continuous speech with a given phonetic transcription using dynamic programming algorithms
,
1981,
ICASSP.
[2]
Julius T. Tou,et al.
Pattern Recognition Principles
,
1974
.
[3]
John S. Bridle,et al.
ZIP: A dynamic programming algorithm for time-aligning two indefinitely long utterances
,
1983,
ICASSP.
[4]
Matthew Lennig.
Automatic alignment of natural speech with a corresponding transcription
,
1983,
Speech Commun..
[5]
Stephen E. Levinson,et al.
On temporal alignment of sentences of natural and synthetic speech
,
1983
.
[6]
Aaron E. Rosenberg,et al.
An improved endpoint detector for isolated word recognition
,
1981
.