Exploring LDA-Based Document Model for Geographic Information Retrieval

Latent Dirichlet Allocation (LDA) model, a formal generative model, has been used to improve ad-hoc information retrieval recently. However, its feasibility and effectiveness for geographic information retrieval has not been explored. This paper proposes an LDA-based document model to improve geographic information retrieval by inheriting the LDA model with text retrieval model. The proposed model has been evaluated on GeoCLEF2007 collection. This is a part of the experiments of Columbus Project of Microsoft Research Asia (MSRA) in GeoCLEF2007 (a cross-language geographical retrieval track which is part of Cross Language Evaluation Forum). This is the second time we participate in this event. Since the queries in GeoCLEF2007 are similar to those in GeoCLEF2006, we leverage most of the methods that we used in GeoCLEF2006, including MSRAWhitelist, MSRAExpansion, MSRALocation and MSRAText approaches. The difference is that MSRAManual approach is not included this time, and we use MSRALDA instead. The results show that the application of LDA model in GeoCLEF monolingual English task performs stably but needs to be further explored.