A system for speech driven information retrieval

In this paper we present a system that allows users to search information in a document collection using a spoken query. The system is based on a speech recognizer and on an information retrieval engine. The system works for Spanish language. We evaluated the system using CLEF'01 test set, extended to include spoken queries. We proposed an adaptation of vocabulary and language model, to reduce the out of vocabulary word problem. In order to reduce errors caused by words in a foreign language, we expanded our pronunciation lexicon to include the pronunciation of English words. Experiments showed a relative gain in retrieval precision of 6.34%, a relative reduction in OOV word rate of 24.71% and a relative reduction in WER of 10.87%.