Multilingual Web retrieval: An experiment in English–Chinese business intelligence

As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIR), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem. Cross-language information retrieval has attracted much attention in recent years. Most research systems have achieved satisfactory performance on standard Text REtrieval Conference (TREC) collections such as news articles, but CLIR techniques have not been widely studied and evaluated for applications such as Web portals. In this article, the authors present their research in developing and evaluating a multilingual English–Chinese Web portal that incorporates various CLIR techniques for use in the business domain. A dictionary-based approach was adopted and combines phrasal translation, co-occurrence analysis, and pre- and posttranslation query expansion. The portal was evaluated by domain experts, using a set of queries in both English and Chinese. The experimental results showed that co-occurrence-based phrasal translation achieved a 74.6p improvement in precision over simple word-by-word translation. When used together, pre- and posttranslation query expansion improved the performance slightly, achieving a 78.0p improvement over the baseline word-by-word translation approach. In general, applying CLIR techniques in Web applications shows promise. © 2006 Wiley Periodicals, Inc.

[1]  Hsi-Jian Lee,et al.  Anchor text mining for translation of Web queries: A transitive translation approach , 2004, TOIS.

[2]  Su Liu ECIRS: an English-Chinese cross-language information-retrieval system , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[3]  Fredric C. Gey,et al.  Combining multiple sources for short query translation in Chinese-English cross-language information retrieval , 2000, IRAL '00.

[4]  Carl Lagoze,et al.  Focused Crawls, Tunneling, and Digital Libraries , 2002, ECDL.

[5]  Amanda Spink,et al.  Selected results from a large study of Web searching: the Excite study , 2000, Inf. Res..

[6]  Tetsuya Sakai MT-based Japanese-Enlish cross-language IR experiments using the TREC test collections , 2000, IRAL '00.

[7]  Ophir Frieder,et al.  On bidirectional English-Arabic search , 2002, J. Assoc. Inf. Sci. Technol..

[8]  TREC-9 Cross-Language Information Retrieval (English-Chinese) Overview , 2000, TREC.

[9]  Joseph A. Konstan,et al.  Introduction to recommender systems: Algorithms and Evaluation , 2004, TOIS.

[10]  Jialun Qin,et al.  Building domain-specific Web collections for scientific digital libraries: a meta-search enhanced focused crawling method , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[11]  Nigel Collier,et al.  A comparison of query translation methods for English-Japanese cross-language information retrieval (poster abstract) , 1999, SIGIR '99.

[12]  Jean Paul Ballerini,et al.  Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.

[13]  Hsinchun Chen,et al.  Updateable PAT-Tree Approach to Chinese Key PhraseExtraction using Mutual Information: A Linguistic Foundation for Knowledge Management , 1999 .

[14]  Hsin-Hsi Chen,et al.  Overview of CLIR Task at the Fourth NTCIR Workshop , 2004, NTCIR.

[15]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[16]  Kui-Lam Kwok Exploiting a Chinese-English bilingual wordlist for English-Chinese cross language information retrieval , 2000, IRAL '00.

[17]  Gerard Salton,et al.  Experiments in Multi-Lingual Information Retrieval , 1972, Inf. Process. Lett..

[18]  Hsinchun Chen,et al.  Comparing noun phrasing techniques for use with medical digital library tools , 2000 .

[19]  Giles,et al.  Searching the world wide Web , 1998, Science.

[20]  Changning Huang,et al.  Improving query translation for cross-language information retrieval using statistical models , 2001, SIGIR '01.

[21]  Masatoshi Yoshikawa,et al.  Query term disambiguation for Web cross-language information retrieval using a search engine , 2000, IRAL '00.

[22]  Mark W. Davis,et al.  Free Resources And Advanced Alignment For Cross-Language Text Retrieval , 1997, TREC.

[23]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[24]  Jianqiang Wang,et al.  NTCIR-2 ECIR Experiments at Maryland: Comparing Pirkola's Structured Queries and Balanced Translation , 2001, NTCIR.

[25]  Alexander M. Fraser,et al.  TREC 2001 Cross-lingual Retrieval at BBN , 2001, TREC.

[26]  Ellen M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[27]  Hsi-Jian Lee,et al.  Translation of web queries using anchor text mining , 2002, TALIP.

[28]  Jian-Yun Nie,et al.  Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.

[29]  Michael Chau,et al.  Comparison of Three Vertical Search Spiders , 2003, Computer.

[30]  James Mayfield,et al.  Comparing cross-language query expansion techniques by degrading translation resources , 2002, SIGIR '02.

[31]  Masatoshi Yoshikawa,et al.  A combined statistical query term disambiguation in cross-language information retrieval , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[32]  Michael L. Littman,et al.  A statistical method for language-independent representation of the topical content of text segments , 2007 .

[33]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[34]  Hsi-Jian Lee,et al.  Anchor text mining for translation of Web queries , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[35]  Noriko Kando,et al.  The web retrieval task and its evaluation in the third NTCIR workshop , 2002, SIGIR '02.

[36]  Fah-Chun Cheong Internet Agents: Spiders, Wanderers, Brokers, and 'Bots , 1996 .

[37]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[38]  W. Bruce Croft,et al.  Dictionary Methods for Cross-Lingual Information Retrieval , 1996, DEXA.

[39]  Mark W. Davis,et al.  A TREC Evaluation of Query Translation Methods For Multi-Lingual Text Retrieval , 1995, TREC.

[40]  Pu-Jen Cheng,et al.  Translating unknown cross-lingual queries in digital libraries using a Web-based approach , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[41]  Hans Uszkoreit,et al.  MULINEX: Multilingual Web Search and Navigation , 1999 .

[42]  Aviezri S. Fraenkel,et al.  Local Feedback in Full-Text Retrieval Systems , 1977, JACM.

[43]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[44]  Sergei Nirenburg,et al.  Keizai: An Interactive Cross-Language Text Retrieval System , 2000 .