Correction de césures et enrichissement de requêtes pour la recherche de livres

Digitized books are now a common source of information on the Web, however OCR sometimes introduces errors that can penalize Information Retrieval. In this paper we propose a method for correcting hyphenations and we analyse its impact on a standard book retrieval task. We also experiment query expansion with words extracted from the Wikipedia page related to the query. We show that there is a significant improvement over the state-of-the-art when using a large weighted list of words. MOTS-CLES : Livres numerises, cesures, enrichissement de requete, Wikipedia.