Cross-language information access to multilingual collections on the internet

Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable translated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what form the translated result is presented in. About 100,000 Web pages translated in the last four months of 1997 are used for quantitative study of online and real-time Web page translation.

[1]  Mark W. Davis,et al.  A TREC Evaluation of Query Translation Methods For Multi-Lingual Text Retrieval , 1995, TREC.

[2]  Paul Thompson,et al.  Name Searching and Information Retrieval , 1997, EMNLP.

[3]  Hsin-Hsi Chen,et al.  A Rule-Based and MT-Oriented Approach to Prepositional Phrase Attachment , 1996, COLING.

[4]  Hsin-Hsi Chen,et al.  Description of the NTU System used for MET-2 , 1998, MUC.

[5]  Yorick Wilks,et al.  New Mexico State University: Computing Research Laboratory , 1989 .

[6]  Tania Hershman Real-time Web language translators , 1998 .

[7]  Larry Fitzpatrick,et al.  Automatic feedback using past queries: social searching? , 1997, SIGIR '97.

[8]  Hsin-Hsi Chen,et al.  Identification and Classification of Proper Nouns in Chinese Texts , 1996, COLING.

[9]  Hsin-Hsi Chen,et al.  Proper Name Translation in Cross-Language Information Retrieval , 1998, COLING-ACL.

[10]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[11]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[12]  Pamela W. Jordan,et al.  Coping With Ambiguity in a Large-Scale Machine Translation System , 1994, COLING.

[13]  Jaime G. Carbonell,et al.  An Efficient Interlingua Translation System for Multi-lingual Document Production , 1991, MTSUMMIT.

[14]  Suping Lu,et al.  A Study on the Chinese Romanization Standard in Libraries , 1996 .

[15]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[16]  W. Bruce Croft,et al.  Dictionary Methods for Cross-Lingual Information Retrieval , 1996, DEXA.

[17]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[18]  Hsin-Hsi Chen,et al.  An MT Meta-Server for Information Retrieval on WWW , 1997 .

[19]  Hsin-Hsi Chen,et al.  Machine Translation: An Integrated Approach , 1995 .

[20]  Jonathan Slocum,et al.  The LRC Machine Translation System , 1985, Comput. Linguistics.

[21]  Hsin-Hsi Chen,et al.  A New Hybrid Approach for Chinese-English Query Translation , 1998 .

[22]  Barbara F. Grimes Ethnologue Languages of the World , 1988 .

[23]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.