Automatically transcribing meetings using distant microphones

In this paper, we describe our efforts to develop acoustic models suitable for distant microphone automatic speech recognition. Our goal is to investigate how the performance of a system trained on a combination of close-talking and distant microphone data can be optimized, while assuming as little information about the configuration of (multiple) distant microphones as possible, to avoid guesstimates and lengthy calibration runs. We evaluated our system in NIST's RT-04S "Meeting" speech-to-text evaluation, where speech data was recorded at several sites with a varying number of different table-top microphones, but not with microphone arrays. Body-mounted microphones provide baseline numbers for distant ASR performance and allow for comparisons of meeting speech with other spontaneous speech data.