Chapter XXXIV Cross-Language Information Retrieval on the Web

The Web stands today as the worlds largest source of public information. Its magnitude can also be perceived as a drawback in a certain sense, however: nowadays there is a generalized problem in retriev- ing documents that may be written in any language, but through queries expressed in a single source language. And although Information Retrieval (IR) depends on the availability of digital collections, this key aspect is no longer the only concern. It is time for the multicultural society of Internet to make use of new technologies such as Cross-Language Information Retrieval (CLIR). Whereas classical IR is a field that embraces retrieval models, evaluation, query languages and document indexing involving "small" collections of documents, modern IR tends to focus on Internet search engines, mark-up languages, multimedia contents, the distribution of collections, user interaction and multilingual systems. Thus, CLIR may border on work in the following fields: information retrieval, natural language processing, machine translation and abstracting, speech processing, the interpretation of document images, and human-computer interaction. "Given a query in any medium and any language, select relevant items from a multilingual multimedia collection which can be in any medium and any language, and present them in the style or order most likely to be useful to the querier, with identical or near identical objects in different media or languages appropriately identified" (Hull & Oard, 1997). This sentence sums up the main objective of CLIR, acknowledged as an independent research subfield roughly a decade ago, so that at present a number of international CLIR conferences take place in the world. The most important of these are TREC (Text REtrieval Conference) in the US; NTCIR (NII-NACSIS Test Collection for IR Systems) in Asia; and CLEF (Cross-Language Evaluation Forum) in Europe. This chapter attempts to

[1]  John Tait,et al.  Literature Review of Cross Language Information Retrieval , 2005, WEC.

[2]  Donna K. Harman The First Text REtrieval Conference (TREC-1), Rockville, MD, USA, 4-6 November 1992 , 1993, Inf. Process. Manag..

[3]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[4]  Kazuaki Kishida Prediction of performance of cross-language information retrieval using automatic evaluation of translation , 2008 .

[5]  Jin Zhang,et al.  Multiple language supports in search engines , 2007, Online Inf. Rev..

[6]  Douglas W. Oard,et al.  Dictionary-based techniques for cross-language information retrieval , 2005, Inf. Process. Manag..

[7]  Jinxi Xu,et al.  Empirical studies on the impact of lexical resources on CLIR performance , 2005, Inf. Process. Manag..

[8]  Jiang Zhu,et al.  The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval , 2006, ACL.

[9]  María Dolores,et al.  Evaluación de sistemas de recuperación de información: Aproximaciones y nuevas tendencias , 1999 .

[10]  Jianqiang Wang,et al.  User-assisted query translation for interactive cross-language information retrieval , 2008, Inf. Process. Manag..

[11]  Julio Gonzalo,et al.  Interactive question answering: Is Cross-Language harder than monolingual searching? , 2008, Inf. Process. Manag..

[12]  Kazuaki Kishida,et al.  Technical issues of cross-language information retrieval: a review , 2005, Inf. Process. Manag..

[13]  Carol Peters What Happened in CLEF 2007 , 2007, CLEF.

[14]  Christopher C. Yang,et al.  Building Parallel Corpora by Automatic Title Alignment , 2002, ICADL.

[15]  Nic Gearailt,et al.  Dictionary characteristics in cross-language information retrieval , 2003 .

[16]  D. R. Swanson The Evidence Underlying the Cranfield Results , 1965, The Library Quarterly.

[17]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[18]  Don R. Swanson,et al.  Some Unexplained Aspects of the Cranfield Tests of Indexing Performance Factors , 1971, The Library Quarterly.

[19]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[20]  Gerard Salton,et al.  Automatic Processing of Foreign Language Documents , 1969, COLING.

[21]  Hsin-Hsi Chen,et al.  Overview of CLIR Task at the Sixth NTCIR Workshop , 2005, NTCIR.

[22]  Carol Peters,et al.  From CLEF to TrebleCLEF: promoting technology transfer for multilingual information retrieval , 2007 .

[23]  Stephen P. Harter,et al.  The Cranfield II Relevance Assessments: A Critical Evaluation , 1971, The Library Quarterly.

[24]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[25]  Julio Gonzalo,et al.  Búsqueda de informacion multilingue: estado del arte , 2004, Inteligencia Artif..

[26]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[27]  Martti Juhola,et al.  Creating and exploiting a comparable corpus in cross-language information retrieval , 2007, TOIS.

[28]  Michael Lesk,et al.  THE SEVEN AGES OF INFORMATION RETRIEVAL , 1998 .

[29]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[30]  Donna K. Harman,et al.  Collaborative information seeking and retrieval , 2006 .

[31]  Hsin-Hsi Chen,et al.  Overview of CLIR Task at the Fourth NTCIR Workshop , 2004, NTCIR.

[32]  Martti Juhola,et al.  Focused web crawling in the acquisition of comparable corpora , 2008, Information Retrieval.

[33]  Fredric C. Gey,et al.  Cross-Language Information Retrieval: the way ahead , 2005, Inf. Process. Manag..

[34]  David Ellis The Dilemma of Measurement in Information Retrieval Research , 1996, J. Am. Soc. Inf. Sci..

[35]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.