Issues and Challenges in Building Multilingual Information Access Systems

In this chapter, the authors start their discussion highlighting the importance of Cross Lingual and Multilingual Information Retrieval and access research areas. They then discuss the distinction between Cross Language Information Retrieval (CLIR), Multilingual Information Retrieval (MLIR), Cross Language Information Access (CLIA), and Multilingual Information Access (MLIA) research areas. In addition, in further sections, issues and challenges in these areas are outlined, and various approaches, including machine learning‐based and knowledge‐based approaches to address the multilingual information access, are discussed. The authors describe various subsystems of a MLIA system ranging from query processing to output generation by sharing their experience of building a MLIA system and discuss its architecture. Then evaluation aspects of the MLIA and CLIA systems are discussed at the end of this chapter.

[1]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[2]  James Mayfield,et al.  Converting on-line bilingual dictionaries from human-readable to machine-readable form , 2002, SIGIR '02.

[3]  Bo Yuan,et al.  A cross-language focused crawling algorithm based on multiple relevance prediction strategies , 2009, Comput. Math. Appl..

[4]  Tefko Saracevic,et al.  Evaluation of evaluation in information retrieval , 1995, SIGIR '95.

[5]  Raghavendra Udupa,et al.  Crosslingual Information Retrieval System Enhanced with Transliteration Generation and Mining , 2010 .

[6]  Jean Paul Ballerini,et al.  Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.

[7]  Christopher C. Yang,et al.  Introduction to the special topic section on multilingual information systems , 2006 .

[8]  Vasudeva Varma,et al.  Multi-lingual Indexing Support for CLIR using Language Modeling , 2007, IEEE Data Eng. Bull..

[9]  James Mayfield,et al.  Comparing cross-language query expansion techniques by degrading translation resources , 2002, SIGIR '02.

[10]  Raymond J. Mooney,et al.  Comparative results on using inductive logic programming for corpus-based parser construction , 1995, Learning for Natural Language Processing.

[11]  Ea-Ee Jan,et al.  Transliteration Retrieval Model for Cross Lingual Information Retrieval , 2010, AIRS.

[12]  Douglas W. Oard,et al.  Adaptive vector space text filtering for monolingual and cross-language application , 1996 .

[13]  Jaime G. Carbonell,et al.  Translingual Information Access , 1997 .

[14]  Hsin-Hsi Chen,et al.  Merging Multilingual Information Retrieval Results Based on Prediction of Retrieval Effectiveness , 2004, NTCIR.

[15]  David Yarowsky,et al.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[16]  Vasudeva Varma,et al.  A Character n-gram Based Approach for Improved Recall in Indian Language NER , 2008, IJCNLP.

[17]  Martin Braschler,et al.  Multilingual Information Retrieval Based on Document Alignment Techniques , 1998, ECDL.

[18]  Christopher J. Fox,et al.  A stop list for general text , 1989, SIGF.

[19]  Hitoshi Iida,et al.  Experiments and Prospects of Example-Based Machine Translation , 1991, ACL.

[20]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[21]  Sivaji Bandyopadhyay,et al.  Named Entity Recognition in Bengali: A Conditional Random Field Approach , 2008, IJCNLP.

[22]  Satoshi Sato,et al.  CTM: An Example-Based Translation Aid System , 1992, COLING.

[23]  András Kocsor,et al.  Sentence Alignment of Hungarian-English Parallel Corpora Using a Hybrid Algorithm , 2008, Acta Cybern..

[24]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[25]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[26]  Dan Klein,et al.  Named Entity Recognition with Character-Level Models , 2003, CoNLL.

[27]  Vasudeva Varma,et al.  WebKhoj: Indian language IR from multiple character encodings , 2006, WWW '06.

[28]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[29]  Hitoshi Isahara,et al.  A machine transliteration model based on correspondence between graphemes and phonemes , 2006, TALIP.

[30]  Vasudeva Varma,et al.  Hindi, Telugu, Oromo, English CLIR Evaluation , 2006, CLEF.

[31]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[33]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[34]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[35]  Hanna M. Wallach,et al.  Conditional Random Fields: An Introduction , 2004 .

[36]  Horacio Rodríguez,et al.  Improving Term Extraction by System Combination Using Boosting , 2001, ECML.

[37]  W. Bruce Croft,et al.  Dictionary Methods for Cross-Lingual Information Retrieval , 1996, DEXA.

[38]  Leah S. Larkey,et al.  Hindi CLIR in thirty days , 2003, TALIP.

[39]  Douglas W. Oard,et al.  A survey of multilingual text retrieval , 1996 .

[40]  Josef van Genabith,et al.  Automatic Extraction of Arabic Multiword Expressions , 2010, MWE@COLING.

[41]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[42]  Ilya Segalovich,et al.  A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine , 2003, MLMTA.

[43]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[44]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[45]  Raymond J. Mooney,et al.  Automatic Construction of Semantic Lexicons for Learning Natural Language Interfaces , 1999, AAAI/IAAI.

[46]  Vasudeva Varma,et al.  Experiments in Telugu NER: A Conditional Random Field Approach , 2008, IJCNLP.

[47]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[48]  Ralf D. Brown,et al.  Example-Based Machine Translation in the Pangloss System , 1996, COLING.