Cross-lingual Semantic Specialization via Lexical Relation Induction

Semantic specialization integrates structured linguistic knowledge from external resources (such as lexical relations in WordNet) into pretrained distributional vectors in the form of constraints. However, this technique cannot be leveraged in many languages, because their structured external resources are typically incomplete or non-existent. To bridge this gap, we propose a novel method that transfers specialization from a resource-rich source language (English) to virtually any target language. Our specialization transfer comprises two crucial steps: 1) Inducing noisy constraints in the target language through automatic word translation; and 2) Filtering the noisy constraints via a state-of-the-art relation prediction model trained on the source language constraints. This allows us to specialize any set of distributional vectors in the target language with the refined constraints. We prove the effectiveness of our method through intrinsic word similarity evaluation in 8 languages, and with 3 downstream tasks in 5 languages: lexical simplification, dialog state tracking, and semantic textual similarity. The gains over the previous state-of-art specialization methods are substantial and consistent across languages. Our results also suggest that the transfer method is effective even for lexically distant source-target language pairs. Finally, as a by-product, our method produces lists of WordNet-style lexical relations in resource-poor languages.

[1]  David Kauchak,et al.  Learning a Lexical Simplifier Using Wikipedia , 2014, ACL.

[2]  Siddharth Patwardhan,et al.  The Role of Context Types and Dimensionality in Learning Word Embeddings , 2016, NAACL.

[3]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[4]  Anders Søgaard,et al.  On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.

[5]  Anna Korhonen,et al.  Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules , 2017, ACL.

[6]  Sebastian Ruder,et al.  A survey of cross-lingual embedding models , 2017, ArXiv.

[7]  Ivan Vulic,et al.  Survey on the Use of Typological Information in Natural Language Processing , 2016, COLING.

[8]  Sanja Stajner,et al.  Making It Simplext , 2015, ACM Trans. Access. Comput..

[9]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[10]  Eric Fosler-Lussier,et al.  Adjusting Word Embeddings with Semantic Intensity Orders , 2016, Rep4NLP@ACL.

[11]  Sandra M. Aluísio,et al.  SIMPLEX-PB: A Lexical Simplification Database and Benchmark for Portuguese , 2018, PROPOR.

[12]  Ivan Vulić,et al.  Specialising Word Vectors for Lexical Entailment , 2017, NAACL.

[13]  Christopher Potts,et al.  Retrofitting Distributional Embeddings to Knowledge Graphs with Functional Relations , 2017, COLING.

[14]  Steve Young,et al.  Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints , 2017 .

[15]  Makoto Miwa,et al.  Word Embedding-based Antonym Detection using Thesauri and Distributional Information , 2015, NAACL.

[16]  Mark Dredze,et al.  Improving Lexical Embeddings with Semantic Knowledge , 2014, ACL.

[17]  Kevin Gimpel,et al.  From Paraphrase Database to Compositional Paraphrase Model and Back , 2015, Transactions of the Association for Computational Linguistics.

[18]  Steve J. Young,et al.  Still talking to machines (cognitively speaking) , 2010, INTERSPEECH.

[19]  Goran Glavas,et al.  Explicit Retrofitting of Distributional Word Vectors , 2018, ACL.

[20]  Goran Glavas,et al.  Informing Unsupervised Pretraining with External Linguistic Knowledge , 2019, ArXiv.

[21]  Goran Glavas,et al.  Discriminating between Lexico-Semantic Relations with the Specialization Tensor Model , 2018, NAACL.

[22]  Gökhan Tür,et al.  Intent detection using semantically enriched word embeddings , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[23]  Roi Reichart,et al.  Separated by an Un-common Language: Towards Judgment Language Informed Vector Space Modeling , 2015 .

[24]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[25]  Shashi Narayan,et al.  Encoding Prior Knowledge with Eigenword Embeddings , 2015, TACL.

[26]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[27]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[28]  Lu Chen,et al.  Towards Universal Dialogue State Tracking , 2018, EMNLP.

[29]  Agnieszka Mykowiecka,et al.  SimLex-999 for Polish , 2018, LREC.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[32]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[33]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[34]  Anna Korhonen,et al.  Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints , 2017, TACL.

[35]  Anna Korhonen,et al.  Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation , 2017, EMNLP.

[36]  Jouko Vankka,et al.  Finnish resources for evaluating language model semantics , 2017, NODALIDA.

[37]  Gang Wang,et al.  RC-NET: A General Framework for Incorporating Knowledge into Word Representations , 2014, CIKM.

[38]  Tsung-Hsien Wen,et al.  Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.

[39]  Stephen Clark,et al.  Specializing Word Embeddings for Similarity or Relatedness , 2015, EMNLP.

[40]  Goran Glavas,et al.  Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization , 2018, EMNLP.

[41]  Sara Tonelli,et al.  SIMPITIKI: a Simplification corpus for Italian , 2016, CLiC-it/EVALITA.

[42]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[43]  Thierry Poibeau,et al.  Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing , 2018, Computational Linguistics.

[44]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[45]  Qian Liu,et al.  Semantic Structure-Based Word Embedding by Incorporating Concept Convergence and Word Divergence , 2018, AAAI.

[46]  Yang Shao,et al.  HCTI at SemEval-2017 Task 1: Use convolutional neural network to evaluate Semantic Textual Similarity , 2017, SemEval@ACL.

[47]  Goran Glavas,et al.  Specializing Distributional Vectors of All Words for Lexical Entailment , 2019, RepL4NLP@ACL.

[48]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[49]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[50]  Sampo Pyysalo,et al.  Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance , 2016, RepEval@ACL.

[51]  Qian Liu,et al.  Task-oriented Word Embedding for Text Classification , 2018, COLING.

[52]  Ngoc Thang Vu,et al.  Hierarchical Embeddings for Hypernymy Detection and Directionality , 2017, EMNLP.

[53]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[54]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[55]  Goran Glavas,et al.  Simplifying Lexical Simplification: Do We Need Simplified Corpora? , 2015, ACL.

[56]  Hervé Jégou,et al.  Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion , 2018, EMNLP.

[57]  Goran Glavas,et al.  Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources , 2018, NAACL.

[58]  Goran Glavas,et al.  How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions , 2019, ACL.

[59]  Horacio Saggion,et al.  Book Review: Automatic Text Simplification by Horacio Saggion , 2017, CL.

[60]  Ivan Vulić,et al.  Fully Statistical Neural Belief Tracking , 2018, ACL.

[61]  Tie-Yan Liu,et al.  Knowledge-Powered Deep Learning for Word Embedding , 2014, ECML/PKDD.

[62]  Roy Schwartz,et al.  Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction , 2015, CoNLL.

[63]  Yu Hu,et al.  Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints , 2015, ACL.

[64]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[65]  Olcay Taner Yildiz,et al.  AnlamVer: Semantic Model Evaluation Dataset for Turkish - Word Similarity and Relatedness , 2018, COLING.

[66]  Dean P. Foster,et al.  Eigenwords: spectral word embeddings , 2015, J. Mach. Learn. Res..

[67]  Jingwei Zhang,et al.  Word Semantic Representations using Bayesian Probabilistic Tensor Factorization , 2014, EMNLP.

[68]  Matthew Henderson,et al.  Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised adaptation , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).