Alternative Approaches for Cross-Language Text Retrieval

The explosive growth of the Internet and other sources of networked information have made automatic me diation of access to networked information sources an increasingly important problem Much of this informa tion is expressed as electronic text and it is becoming practical to automatically convert some printed docu ments and recorded speech to electronic text as well Thus automated systems capable of detecting useful documents are nding widespread application With even a small number of languages it can be in convenient to issue the same query repeatedly in every language so users who are able to read more than one language will likely prefer a multilingual text retrieval system over a collection of monolingual systems And since reading ability in a language does not always im ply uent writing ability in that language such users will likely nd cross language text retrieval particularly useful for languages in which they are less con dent of their ability to express their information needs e ec tively The use of such systems can be also be bene cial if the user is able to read only a single language For example when only a small portion of the doc ument collection will ever be examined by the user performing retrieval before translation can be signif icantly more economical than performing translation before retrieval So when the application is su ciently important to justify the time and e ort required for translation those costs can be minimized if an e ec tive cross language text retrieval system is available Even when translation is not available there are cir cumstances in which cross language text retrieval could be useful to a monolingual user For example a re searcher might nd a paper published in an unfamil iar language useful if that paper contains references to works by the same author that are in the researcher s native language Multilingual text retrieval can be de ned as selec tion of useful documents from collections that may con tain several languages English French Chinese etc This formulation allows for the possibility that individ ual documents might contain more than one language a common occurrence in some applications Both cross language and within language retrieval are in cluded in this formulation but it is the cross language aspect of the problem which distinguishes multilin gual text retrieval from its well studied monolingual counterpart At the SIGIR workshop on Cross Linguistic Information Retrieval the participants dis cussed the proliferation of terminology being used to describe the eld and settled on Cross Language as the best single description of the salient aspect of the problem Multilingual was felt to be too broad since that term has also been used to describe systems able to perform within language retrieval in more than one language but that lack any cross language capabil ity Cross lingual and cross linguistic were felt to be equally good descriptions of the eld but cross language was selected as the preferred term in the interest of standardization Unfortunately at about the same time the U S Defense Advanced Research Projects Agency DARPA introduced translingual as their preferred term so we are still some distance from reaching consensus on this matter

[1]  Dagobert Soergel,et al.  Multilingual Thesauri in Cross-Language Text and Speech Retrieval , 1997 .

[2]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI): TREC-3 Report , 1994, TREC.

[3]  Douglas W. Oard,et al.  A survey of multilingual text retrieval , 1996 .

[4]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[5]  Mark W. Davis,et al.  New Experiments In Cross-Language Text Retrieval At NMSU's Computing Research Lab , 1996, TREC.

[6]  Ellen M. Voorhees,et al.  Information Technology: The Fifth Text REtrieval Conference [TREC-5] | NIST , 1997 .

[7]  Gen-ichiro Kikui,et al.  Identifying the Coding System and Language of On-line Documents on the Internet , 1996, COLING.

[8]  Jaime G. Carbonell,et al.  Translingual Information Access , 1997 .

[9]  Michael L. Littman,et al.  Automatic Cross-Language Retrieval Using Latent Semantic Indexing , 1997 .

[10]  Christian Fluhr,et al.  Textual database lexicon used as a filter to resolve semantic ambiguity application on multilingual , 1995 .

[11]  W. Bruce Croft,et al.  Dictionary Methods for Cross-Lingual Information Retrieval , 1996, DEXA.

[12]  Megumi Kameyama,et al.  Information Extraction across Linguistic Barriers , 1997 .

[13]  M. Wechsler Cross-language Speech Retrieval , 1997 .

[14]  Douglas W. Oard,et al.  Evaluating Cross-Language Text Filtering Effectiveness , 1998 .

[15]  Douglas W. Oard,et al.  Adaptive vector space text filtering for monolingual and cross-language application , 1996 .

[16]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[17]  W OardDouglas The State of the Art in Text Filtering , 1997 .

[18]  Pim van der Eijk Automating the Acquisition of Bilingual Terminology , 1993, EACL.

[19]  Julio Gonzalo,et al.  An Approach to Conceptual Text Retrieval Using the EuroWordNet Multilingual Semantic Database , 1997 .

[20]  Michael W. Berry,et al.  Using latent semantic indexing for multilanguage information retrieval , 1995, Comput. Humanit..

[21]  Douglas W. Oard Adaptative Filtering of Multilingual Document Streams , 1997, RIAO.

[22]  Yoshihiko Hayashi,et al.  TITAN: A Cross-linguistic Search Engine for the WWW , 1997 .