Integration of PLSA into Probabilistic CLIR Model - Yokohama National University at NTCIR4 CLIR

In this paper, we propose a method of CrossLanguage Information Retrieval based on an integration of a probabilistic CLIR model and Probabilistic Latent Semantic Analysis (PLSA). PLSA is adopted to extract the information of translation probability from a parallel corpus. The information is utilized in a probabilistic CLIR model. Although the probabilistic CLIR model with PLSA is quite effective, it takes very long time in the processing. We therefore introduce an approximation method based on a two-phased retrieval model in order to reduce the computational cost. Using the model, we submitted runs for Japaneseto-English bilingual retrieval in CLIR task of NTCIR4.