Robust automatic transcription of English speech corpora

This research assesses the ability of a Hidden Markov Model (HMM) based method to generate an accurate and reliable automatic phone-level transcriptions for a small vocabulary speech corpus. In particular, we are interested in a system that requires only orthographic transcription of the target corpus, and can be bootstrapped from models trained on an independent phonetically transcribed corpus. The question we ask is whether reliable results can be achieved despite a large mismatch between the bootstrapping corpus (US English) and the target corpus (British English). Quality of the automatic transcriptions is judged by comparison with manual transcriptions produced by several independent transcribers. Different training strategies are compared for handling the interspeaker variability in the target corpus. The transcriptions generated from the most reliable system deviate from the average manual transcription by an average of 20 ms.