Interactive Cross-Language Document Selection

The problem of finding documents written in a language that the searcher cannot read is perhaps the most challenging application of cross-language information retrieval technology. In interactive applications, that task involves at least two steps: (1) the machine locates promising documents in a collection that is larger than the searcher could scan, and (2) the searcher recognizes documents relevant to their intended use from among those nominated by the machine. This article presents the results of experiments designed to explore three techniques for supporting interactive relevance assessment: (1) full machine translation, (2) rapid term-by-term translation, and (3) focused phrase translation. Machine translation was found to better support this task than term-by-term translation, and focused phrase translation further improved recall without an adverse effect on precision. The article concludes with an assessment of the strengths and weaknesses of the evaluation framework used in this study and some remarks on implications of these results for future evaluation campaigns.

[1]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[2]  Mark W. Davis,et al.  Getting Information from Documents You Cannot Read: An Interactive Cross-Language Text Retrieval and Summarization System , 1999 .

[3]  Douglas W. Oard,et al.  Support for Interactive Document Selection in Cross-Language Information Retrieval , 1999, Inf. Process. Manag..

[4]  Paul Over,et al.  Comparing interactive information retrieval systems across sites: the TREC-6 interactive track matrix experiment , 1998, SIGIR '98.

[5]  Julio Gonzalo,et al.  Noun Phrase Translations for Cross-Language Document Selection , 2001, CLEF.

[6]  Douglas W. Oard,et al.  Cross-language Information Retrieval , 2021, ArXiv.

[7]  Julio Gonzalo,et al.  Cross-Language Information Access through Phrase Browsing , 2001, NLDB.

[8]  Philip Resnik,et al.  Evaluating Multilingual Gisting of Web Pages , 1997, ArXiv.

[9]  Donald B. Cleveland,et al.  Introduction to the indexing and abstracting , 1982 .

[10]  Douglas W. Oard,et al.  CLEF Experiments at Maryland: Statistical Stemming and Backoff Translation , 2000, CLEF.

[11]  Hans Uszkoreit,et al.  A system for supporting cross-lingual information retrieval , 2000, Inf. Process. Manag..

[12]  Mark W. Davis,et al.  Improving cross-language text retrieval with human interactions , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[13]  John S. White,et al.  A task-oriented evaluation metric for machine translation , 1998, LREC.

[14]  John White,et al.  Predicting what MT is good for: user judgments and task performance , 1998, AMTA.

[15]  Mark Sanderson,et al.  Accurate user directed summarization from existing tools , 1998, CIKM '98.

[16]  Mark Sanderson,et al.  iCLEF at Sheffield , 2001, CLEF.

[17]  Ellen M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[18]  Jianqiang Wang,et al.  iCLEF 2001 at Maryland: Comparing Term-for-Term Gloss and MT , 2001, CLEF.

[19]  Efstathios Stamatatos,et al.  Supporting Multilinguality in Library Automation Systems Using AI Tools , 1999, Appl. Artif. Intell..

[20]  Andrew Turpin,et al.  Do batch and user evaluations give the same results? , 2000, SIGIR '00.

[21]  Masami Suzuki,et al.  A Method for Supporting Document Selection in Cross-language Information Retrieval and its Evaluation , 2001, Comput. Humanit..