Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules

Morphologically rich languages accentuate two properties of distributional vector space models: 1) the difficulty of inducing accurate representations for low-frequency word forms; and 2) insensitivity to distinct lexical relations that have similar distributional signatures. These effects are detrimental for language understanding systems, which may infer that 'inexpensive' is a rephrasing for 'expensive' or may not associate 'acquire' with 'acquires'. In this work, we propose a novel morph-fitting procedure which moves past the use of curated semantic lexicons for improving distributional vector spaces. Instead, our method injects morphological constraints generated using simple language-specific rules, pulling inflectional forms of the same word close together and pushing derivational antonyms far apart. In intrinsic evaluation over four languages, we show that our approach: 1) improves low-frequency word estimates; and 2) boosts the semantic quality of the entire word vector collection. Finally, we show that morph-fitted vectors yield large gains in the downstream task of dialogue state tracking, highlighting the importance of morphology for tackling long-tail phenomena in language understanding tasks.

[1]  Matthew Henderson,et al.  The third Dialog State Tracking Challenge , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[2]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[3]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[4]  Chris Dyer,et al.  Ontologically Grounded Multi-sense Representation Learning for Semantic Vector Space Models , 2015, NAACL.

[5]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[6]  Lyan Verwimp,et al.  Character-Word LSTM Language Models , 2017, EACL.

[7]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[8]  Steve J. Young,et al.  Cognitive User Interfaces , 2010, IEEE Signal Processing Magazine.

[9]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[10]  Mark Dredze,et al.  Improving Lexical Embeddings with Semantic Knowledge , 2014, ACL.

[11]  Kevin Gimpel,et al.  From Paraphrase Database to Compositional Paraphrase Model and Back , 2015, Transactions of the Association for Computational Linguistics.

[12]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[13]  David Vandyke,et al.  Continuously Learning Neural Dialogue Management , 2016, ArXiv.

[14]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[15]  Roi Reichart,et al.  Separated by an Un-common Language: Towards Judgment Language Informed Vector Space Modeling , 2015 .

[16]  Tie-Yan Liu,et al.  Knowledge-Powered Deep Learning for Word Embedding , 2014, ECML/PKDD.

[17]  Roy Schwartz,et al.  Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction , 2015, CoNLL.

[18]  Anna Korhonen,et al.  On the Role of Seed Lexicons in Learning Bilingual Word Embeddings , 2016, ACL.

[19]  Ryan Cotterell,et al.  The SIGMORPHON 2016 Shared Task—Morphological Reinflection , 2016, SIGMORPHON.

[20]  Parminder Bhatia,et al.  Morphological Priors for Probabilistic Neural Word Embeddings , 2016, EMNLP.

[21]  Yu Hu,et al.  Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints , 2015, ACL.

[22]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[23]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[24]  Maxine Eskénazi,et al.  Recipe For Building Robust Spoken Dialog State Trackers: Dialog State Tracking Challenge System Description , 2013, SIGDIAL Conference.

[25]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[26]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[27]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[28]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[29]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[30]  Chris Callison-Burch,et al.  PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification , 2015, ACL.

[31]  David Vandyke,et al.  Multi-domain Dialog State Tracking using Recurrent Neural Networks , 2015, ACL.

[32]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[33]  Anna Korhonen,et al.  Is "Universal Syntax" Universally Useful for Learning Distributed Word Representations? , 2016, ACL.

[34]  Ngoc Thang Vu,et al.  Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction , 2016, ACL.

[35]  Antoine Raux,et al.  The Dialog State Tracking Challenge Series: A Review , 2016, Dialogue Discourse.

[36]  Jan Snajder,et al.  DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German , 2013, ACL.

[37]  Matthew Henderson,et al.  Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised adaptation , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[38]  Steve Young,et al.  Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints , 2017 .

[39]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[40]  Shashi Narayan,et al.  Encoding Prior Knowledge with Eigenword Embeddings , 2015, TACL.

[41]  Christo Kirov,et al.  A Language-Independent Feature Schema for Inflectional Morphology , 2015, ACL.

[42]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[43]  Daniel Jurafsky,et al.  Knowledge-Free Induction of Inflectional Morphologies , 2001, NAACL.

[44]  Kris Cao,et al.  A Joint Model for Word Embedding and Word Morphology , 2016, Rep4NLP@ACL.

[45]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[46]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[47]  Roy Schwartz,et al.  Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives , 2016, HLT-NAACL.

[48]  Tie-Yan Liu,et al.  Co-learning of Word Representations and Morpheme Representations , 2014, COLING.

[49]  Ryan Cotterell,et al.  Morphological Smoothing and Extrapolation of Word Embeddings , 2016, ACL.

[50]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[51]  Yulia Tsvetkov,et al.  Morphological Inflection Generation Using Character Sequence to Sequence Learning , 2015, NAACL.

[52]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[53]  Ryan Cotterell,et al.  Morphological Word-Embeddings , 2019, NAACL.

[54]  Philipp Cimiano,et al.  Representing Multilingual Data as Linked Data: the Case of BabelNet 2.0 , 2014, LREC.

[55]  Marco Marelli,et al.  Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics , 2013, ACL.

[56]  Fei Liu,et al.  Dialog state tracking, a machine reading approach using Memory Network , 2016, EACL.

[57]  Filip Jurcícek,et al.  Incremental LSTM-based dialog state tracker , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[58]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[59]  Kevin Gimpel,et al.  Charagram: Embedding Words and Sentences via Character n-grams , 2016, EMNLP.

[60]  Matthew Henderson,et al.  Word-Based Dialog State Tracking with Recurrent Neural Networks , 2014, SIGDIAL Conference.

[61]  Ryan Cotterell,et al.  Neural Multi-Source Morphological Reinflection , 2016, EACL.

[62]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[63]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[64]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[65]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[66]  Yoav Goldberg,et al.  Morphological Inflection Generation with Hard Monotonic Attention , 2016, ACL.

[67]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[68]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[69]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[70]  Georgiana Dinu,et al.  Improving zero-shot learning by mitigating the hubness problem , 2014, ICLR.

[71]  Philipp Koehn,et al.  Enriching Morphologically Poor Languages for Statistical Machine Translation , 2008, ACL.

[72]  Julien Perez,et al.  Gated End-to-End Memory Networks , 2016, EACL.

[73]  Ryan Cotterell,et al.  Joint Semantic Synthesis and Morphological Analysis of the Derived Word , 2017, TACL.

[74]  Yannick Versley,et al.  Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither , 2010, SPMRL@NAACL-HLT.

[75]  Stephen Clark,et al.  Specializing Word Embeddings for Similarity or Relatedness , 2015, EMNLP.

[76]  David Vandyke,et al.  On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.

[77]  Anders Søgaard,et al.  Any-language frame-semantic parsing , 2015, EMNLP.

[78]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[79]  Miroslav Vodolán,et al.  Hybrid Dialog State Tracker with ASR Features , 2017, EACL.

[80]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[81]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[82]  Radu Soricut,et al.  Unsupervised Morphology Induction Using Word Embeddings , 2015, NAACL.

[83]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[84]  Gang Wang,et al.  RC-NET: A General Framework for Incorporating Knowledge into Word Representations , 2014, CIKM.

[85]  Tsung-Hsien Wen,et al.  Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.