Incorporating Both Distributional and Relational Semantics in Word Representations

We investigate the hypothesis that word representations ought to incorporate both distributional and relational semantics. To this end, we employ the Alternating Direction Method of Multipliers (ADMM), which flexibly optimizes a distributional objective on raw text and a relational objective on WordNet. Preliminary results on knowledge base completion, analogy tests, and parsing show that word representations trained on both objectives can give improvements in some cases.

[1]  Christopher D. Manning,et al.  Effect of Non-linear Deep Architecture in Sequence Labeling , 2013, IJCNLP.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Saif Mohammad,et al.  SemEval-2012 Task 2: Measuring Degrees of Relational Similarity , 2012, *SEMEVAL.

[4]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[5]  Mark Dredze,et al.  Improving Lexical Embeddings with Semantic Knowledge , 2014, ACL.

[6]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[7]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[8]  Jie Zhou,et al.  Generalization of Words for Chinese Dependency Parsing , 2013, IWPT.

[9]  Geoffrey Zweig,et al.  Combining Heterogeneous Models for Measuring Relational Similarity , 2013, NAACL.

[10]  Slav Petrov,et al.  Overview of the 2012 Shared Task on Parsing the Web , 2012 .

[11]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[12]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[13]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[14]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[15]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[16]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[17]  Jason Eisner,et al.  Lexical Semantics , 2020, The Handbook of English Linguistics.

[18]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[19]  W. Bruce Croft,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[20]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[21]  Gang Wang,et al.  RC-NET: A General Framework for Incorporating Knowledge into Word Representations , 2014, CIKM.

[22]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[23]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[24]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[25]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[26]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[27]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[28]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[29]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[30]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[31]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.