Query Expansion and Machine Translation for Robust Cross-Lingual Information Retrieval

In this paper, we describe the Information Retrieval subsystem of JAVELIN IV, a question-answering system that answers complex questions from multilingual sources. Our research focus is on different strategies for query term extraction, translation, filtering, expansion and weighting, including a novel alias expansion technique using lexico-syntactic patterns learned with weakly-supervised algorithm. In the NTCIR7 IR4QA evaluation, our retrieval system achieved 59% and 59% MAP in the Chinese-to-Chinese and Japanese-toJapanese subtasks, respectively. We provide a rationale for the retrieval system design, and present a detailed error analysis for our formal run results.