论文信息 - SEARCHER: Shared Embedding Architecture for Effective Retrieval

SEARCHER: Shared Embedding Architecture for Effective Retrieval

We describe an approach to cross lingual information retrieval that does not rely on explicit translation of either document or query terms. Instead, both queries and documents are mapped into a shared embedding space where retrieval is performed. We discuss potential advantages of the approach in handling polysemy and synonymy. We present a method for training the model, and give details of the model implementation. We present experimental results for two cases: Somali-English and Bulgarian-English CLIR.

[1] Yoshua Bengio,et al. BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[2] Phil Blunsom,et al. Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[3] Christopher D. Manning,et al. Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[4] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.