A procedure for automatic alignment of phonetic transcriptions with continuous speech

A system for automatic alignment of phonetic transcriptions with continuous speech has been developed. The speech signal is first segmented into broad classes using a non-parametric Pattern classifier. A knowledge-based dynamic programming algorithm then aligns the broad classes with the phonetic transcriptions. These broad classes provide "islands of reliability" for more detailed segmentation and refinement of boundaries. By doing alignment at the phonetic level, the system can often tolerate inter and intra-speaker variability. The system was evaluated on sixty sentences spoken by three speakers, two male and one female. 93% of the segments are mapped into only one phoneme, 70% of the time the offset between the boundary found by the automatic alignment system and a hand transcriber is less than 10 ms. The performance can be improved by applying more heuristic rules.