Approximate Dimension Reduction at NTCIR

We carried out a comparison of cross-language retrieval methods on the NTCIR-1 data based on dimension reduction (latent semantic indexing). These methods all use a collection parallel documents (translations or approximate translations) and very little, if any, linguistic knowledge. In NTCIR-1, we compared latent semantic indexing, local LSI, and approximate dimensional equalization (ADE). We found that local LSI and ADE performed the best on this collection and were comparable to the best performing systems reported elsewhere. We also ran ADE on the NTCIR-2 and found it fared considerably less well.