Transcription and indexation of broadcast data

We report on recent research on transcribing and indexing broadcast news data for information retrieval purposes. The system described combines an adapted version of the LIMSI 1998 Hub-4E transcription system for speech recognition with text-based IR methods. Experimental results are reported in terms of recognition word error rate and mean average precision for both the TREC SDR98 (100h) and SDR99 (600h) data sets. With query expansion using commercial transcripts, comparable mean average precisions are obtained on manual reference transcriptions and automatic transcriptions with a word error rate of 21.5% measured on a 10 hour data subset.