Lightly Supervised Acoustic Model Training

Although tremendous progress has been made in speech recognition technology, with the capability of todays state-of- the-art systems to transcribe unrestricted continuous speech from bro adcast data, these systems rely on the availability of large amount s of manually transcribed acoustic training data. Obtaining su ch data is both time-consuming and expensive, requiring trained human annotators with substantial amounts of supervision. In this p aper we describe some recent experiments using lightly supervised techniques for acoustic model training in order to reduce the sys tem development cost. The strategy we investigate uses a speech recognizer to transcribe unannotated broadcast news data, and optionally combines the hypothesized transcription with associa ted, but unaligned closed captions or transcripts to create labeled training. We show that this approach can dramatically reduces the cost of building acoustic models.