Automatic processing of audio lectures for information retrieval: vocabulary selection and language modeling

This paper describes our initial progress towards developing a system for automatically transcribing and indexing audio-visual academic lectures for audio information retrieval. We investigate the problem of how to combine generic spoken data sources with subject-specific text sources for processing lecture speech. In addition to word recognition experiments, we perform audio information retrieval simulations to characterize retrieval performance when using errorful automatic transcriptions. Given an appropriately selected vocabulary, we observe that good retrieval performance can be obtained even with high recognition error rates. For language model training, we observe that the addition of spontaneous speech data to subject-specific written material results in more accurate transcriptions, but has a marginal effect on retrieval performance.