3 Comparing Improved Language Models for Sentence Retrieval in Question Answering

A retrieval system is a very important part in a question answ ering framework. It reduces the number of documents to be considered for finding an answer . For further refinement, the documents are split up into smaller chunks to deal with topic variability in larger documents. In our case, we divided the documents into single sentences. Then a language model based approach was used to re-rank the sentence collection. For this purpose, we developed a new language model toolkit. It implements all standard language modeling techniques and is more flexible than o ther tools in terms of backingoff strategies, model combinations and design of the retrie val vocabulary. With the aid of this toolkit we conducted re-ranking experiments with st andard language model based smoothing methods. On top of these algorithms we developed s ome new, improved models including dynamic stop word reduction and stemming. We also experimented with query expansion depending on the type of a query. On a TREC corpus, w e demonstrate that our proposed approaches provide a performance superior to the s tandard methods. In terms of Proceedings of the 17th Meeting of Computational Linguisti cs in the Netherlands Edited by: Peter Dirix, Ineke Schuurman, Vincent Vandeghin ste, and Frank Van Eynde. Copyright c 2007 by the individual authors.