Making MIRACLEs: Interactive translingual search for Cebuano and Hindi

Searching is inherently a user-centered process; people pose the questions for which machines seek answers, and ultimately people judge the degree to which retrieved documents meet their needs. Rapid development of interactive systems that use queries expressed in one language to search documents written in another poses five key challenges: (1) interaction design, (2) query formulation, (3) cross-language search, (4) construction of translated summaries, and (5) machine translation. This article describes the design of MIRACLE, an easily extensible system based on English queries that has previously been used to search French, German, and Spanish documents, and explains how the capabilities of MIRACLE were rapidly extended to accommodate Cebuano and Hindi. Evaluation results for the cross-language search component are presented for both languages, along with results from a brief full-system interactive experiment with Hindi. The article concludes with some observations on directions for further research on interactive cross-language information retrieval.

[1]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[2]  Kui-Lam Kwok,et al.  A new method of weighting query terms for ad-hoc retrieval , 1996, SIGIR '96.

[3]  Ari Pirkola,et al.  The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval , 1998, SIGIR '98.

[4]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[5]  Douglas W. Oard,et al.  Improved Cross-Language Retrieval using Backoff Translation , 2001, HLT.

[6]  Philip Resnik,et al.  The Bible as a Parallel Corpus: Annotating the ‘Book of 2000 Tongues’ , 1999, Comput. Humanit..

[7]  Douglas W. Oard,et al.  Support for Interactive Document Selection in Cross-Language Information Retrieval , 1999, Inf. Process. Manag..

[8]  Jianqiang Wang,et al.  iCLEF 2003 at Maryland: Translation Selection and Document Selection , 2003, CLEF.

[9]  Allen H. Renear,et al.  The Text Encoding Initiative at 10: Not Just an Interchange Format Anymore – But a New Research Community , 1999, Comput. Humanit..

[10]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[11]  Jianqiang Wang,et al.  Mandarin-English Information (MEI): investigating translingual speech retrieval , 2004, Comput. Speech Lang..

[12]  Douglas W. Oard,et al.  The effect of bilingual term list size on dictionary-based cross-language information retrieval , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[13]  Jianqiang Wang,et al.  iCLEF 2001 at Maryland: Comparing Term-for-Term Gloss and MT , 2001, CLEF.

[14]  Sergei Nirenburg,et al.  Keizai: An Interactive Cross-Language Text Retrieval System , 2000 .

[15]  Richard M. Schwartz,et al.  Cross-language headline generation for Hindi , 2003, TALIP.

[16]  Douglas W. Oard,et al.  Mandarin-English Information (MEI): investigating translingual speech retrieval , 2000 .

[17]  Gary Marchionini,et al.  Information Seeking in Electronic Environments , 1995 .

[18]  Douglas W. Oard,et al.  Comparison of word-based and syllable-based retrieval for Tibetan (poster session) , 2000, IRAL '00.

[19]  Douglas W. Oard,et al.  Probabilistic structured query methods , 2003, SIGIR.

[20]  Jianqiang Wang,et al.  iCLEF 2001 at Maryland: Comparing Word-for-Word Gloss and MT , 2001, CLEF.

[21]  K. Järvelin,et al.  The RATF formula (Kwok's formula): exploiting average term frequency in cross-language retrieval , 2002, Inf. Res..

[22]  Yaser Al-Onaizan,et al.  Machine Transliteration of Names in Arabic Texts , 2002, SEMITIC@ACL.

[23]  Karen Spärck Jones,et al.  TREC-6 1997 Spoken Document Retrieval Track Overview and Results , 1997, TREC.

[24]  James C. French,et al.  Using N-grams to Process Hindi Queries with Transliteration Variations , 1997 .

[25]  Jianqiang Wang,et al.  Comparing User-assisted and Automatic Query Translation , 2002, CLEF.

[26]  Alexander M. Fraser,et al.  TREC 2001 Cross-lingual Retrieval at BBN , 2001, TREC.

[27]  Douglas W. Oard,et al.  CLIR Experiments at Maryland for TREC 2002: Evidence Combination for Arabic-English Retrieval , 2002, TREC.