Use of probabilistic topic models for search

Abstract : This thesis solves a common issue in search applications. Typically, the user does not know exactly which terms are used in a document he is searching for. Several attempts have been made to overcome this issue by augmenting the document model and/or the query. In this thesis, a probabilistic topic model augments the document model. Probabilistic document models are formally introduced and inference methods are derived. It is shown how these models can be used for information retrieval tasks and how a search application can be implemented. A prototype was implemented and the implementation is tested and evaluated based on benchmark corpora. The evaluation provides empirical evidence that probabilistic document models improve the retrieval performance significantly, and shows which preprocessing steps should be made before applying the model.