EXPERIMENTS IN INFORMATION RETRIEVAL FROM SPOKEN DOCUMENTS

This paper describes the experiments performed as part of the TREC-97 Spoken Document Retrieval Track. The task was to pick the correct document from 35 hours of recognized speech documents, based on a text query describing exactly one document. Among the experiments we described here are: Vocabulary size experiments to assess the effect of words missing from the speech recognition vocabulary; experiments with speech recognition using a stemmed language model; using confidence annotations that estimate of the correctness of each recognized word; using multiple hypotheses from the recognizer. And finally we also measured the effects of corpus size on the SDR task. Despite fairly high word error rates, information retrieval performance was only slightly degraded for speech recognizer transcribed documents.