CROSS-LANGUAGE DOCUMENT RETRIEVAL BY USING NONLINEAR SEMANTIC MAPPING

A nonlinear semantic mapping procedure is proposed for cross-language document retrieval. The method relies on a nonlinear space reduction technique for constructing semantic embeddings of multilingual document collections. In the proposed method, an independent embedding is constructed for each language in the multilingual collection and the similarities among the resulting semantic representations are used for cross-language document retrieval. Two variants of the proposed method are implemented and compared with a standard cross-language information retrieval technique. It is shown that the proposed method outperforms the conventional one.

[1]  Marta R. Costa-jussà,et al.  A Semantic Feature for Statistical Machine Translation , 2011, SSST@ACL.

[2]  Kalervo Järvelin,et al.  FITE-TRT: a high quality translation technique for OOV words , 2006, SAC '06.

[3]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[4]  Erik Van der Goot,et al.  Near real time information mining in multilingual news , 2009, WWW '09.

[5]  James Mayfield,et al.  JHU/APL Experiments at CLEF: Translation Resources and Score Normalization , 2001, CLEF.

[6]  Jiangping Chen,et al.  Cross-language Search: The Case of Google Language Tools , 2009, First Monday.

[7]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[8]  Ahmed Abdelali,et al.  Benefits of the 'Massively Parallel Rosetta Stone': Cross-Language Information Retrieval with over 30 Languages , 2007, ACL.

[9]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[10]  Mikel L. Forcada,et al.  Opentrad Apertium open-source machine translation system: an opportunity for business and research , 2006 .

[11]  William R. Hersh,et al.  Mapping Vocabularies Using Latent Semantics , 1998 .

[12]  Ludo Waltman,et al.  A Novel Algorithm for Visualizing Concept Associations , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[13]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[14]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[15]  Shih-Hung Wu,et al.  Query Expansion via Link Analysis of Wikipedia for CLIR , 2008, NTCIR.

[16]  Kalervo Järvelin,et al.  s-grams: Defining generalized n-grams for information retrieval , 2007, Inf. Process. Manag..

[17]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[18]  Jörg Tiedemann,et al.  Document-Wide Decoding for Phrase-Based Statistical Machine Translation , 2012, EMNLP.

[19]  Rafael E. Banchs,et al.  Exploiting MDS Projections for Cross-language IR , 2008, SIGIR '08.

[20]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[21]  Turid Hedlund,et al.  Dictionary-Based Cross-Language Information Retrieval: Learning Experiences from CLEF 2000–2002 , 2004, Information Retrieval.

[22]  Peter A. Chew,et al.  Evaluation of the Bible as a Resource for Cross-Language Information Retrieval , 2006 .

[23]  Eva Navas,et al.  BUCEADOR, a multi-language search engine for digital libraries , 2012, LREC.

[24]  Kimmo Kettunen,et al.  Choosing the Best MT Programs for CLIR Purposes - Can MT Metrics Be Helpful? , 2009, ECIR.

[25]  Kazuaki Kishida Prediction of performance of cross-language information retrieval using automatic evaluation of translation , 2008 .

[26]  Yue Zhao,et al.  A unified approach to matching semantic data on the Web , 2013, Knowl. Based Syst..

[27]  Soto Montalvo,et al.  Multilingual news clustering: Feature translation vs. identification of cognate named entities , 2007, Pattern Recognit. Lett..

[28]  Kazuaki Kishida,et al.  Technical issues of cross-language information retrieval: a review , 2005, Inf. Process. Manag..

[29]  Jeanny Hérault,et al.  Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets , 1997, IEEE Trans. Neural Networks.

[30]  Benno Stein,et al.  Cross-Language High Similarity Search: Why No Sub-linear Time Bound Can Be Expected , 2010, ECIR.

[31]  Susan T. Dumais,et al.  Automatic cross-linguistic information retrieval using latent semantic indexing , 2007 .

[32]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[33]  Rafael E. Banchs Semantic Mapping for Related Term Identification , 2009, CICLing.

[34]  Ying Zhang,et al.  Dublin City University at CLEF 2007: Cross-Language Speech Retrieval Experiments , 2007, CLEF.