Learning multi-prototype word embedding from single-prototype word embedding with integrated knowledge

A mini context word sense disambiguation with adapted Lesk algorithm is proposedNew initialization approach is proposed to balance speed and performance.Improved approximation algorithm is proposed to help supervised learningThe framework may utilize any sense inventory with word-sense definition. Distributional semantic models (DSM) or word embeddings are widely used in prediction of semantic similarity and relatedness. However, context aware similarity and relatedness prediction is still a challenging issue because most DSM models or word embeddings use one vector per word without considering polysemy and homonym. In this paper, we propose a supervised fine tuning framework to transform the existing single-prototype word embeddings into multi-prototype word embeddings based on lexical semantic resources. As a post-processing step, the proposed framework is compatible with any sense inventory and any word embedding. To test the proposed learning framework, both intrinsic and extrinsic evaluations are conducted. Experiments results of 3 tasks with 8 datasets show that the multi-prototype word representations learned by the proposed framework outperform single-prototype word representations.

[1]  Tom M. Mitchell,et al.  Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition , 2013, CoNLL.

[2]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[3]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[4]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[5]  Marianna Apidianaki,et al.  Latent Semantic Word Sense Induction and Disambiguation , 2011, ACL.

[6]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[7]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[8]  David M. W. Powers,et al.  Verb similarity on the taxonomy of WordNet , 2006 .

[9]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[10]  Evgeniy Gabrilovich,et al.  Large-scale learning of word relatedness with constraints , 2012, KDD.

[11]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[12]  David Sánchez,et al.  A semantic similarity method based on information content exploiting multiple ontologies , 2013, Expert Syst. Appl..

[13]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[14]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[15]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[16]  Justin C. Hulbert,et al.  Understanding words in context: The role of Broca's area in word comprehension , 2007, Brain Research.

[17]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[18]  Kezhi Mao,et al.  Supervised Fine Tuning for Word Embedding with Integrated Knowledge , 2015, ArXiv.

[19]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[20]  Roberto Navigli A Quick Tour of Word Sense Disambiguation, Induction and Related Approaches , 2012, SOFSEM.

[21]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[22]  Philippe Langlais,et al.  Evaluating Variants of the Lesk Approach for Disambiguating Words , 2004, LREC.

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[25]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[26]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[27]  Hua Xu,et al.  Chinese comments sentiment classification based on word2vec and SVMperf , 2015, Expert Syst. Appl..

[28]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[30]  Wanxiang Che,et al.  Learning Sense-specific Word Embeddings By Exploiting Bilingual Resources , 2014, COLING.

[31]  Enhong Chen,et al.  A Probabilistic Model for Learning Multi-Prototype Word Embeddings , 2014, COLING.

[32]  Ngoc Thanh Nguyen,et al.  Semantic similarity measures for enhancing information retrieval in folksonomies , 2013, Expert Syst. Appl..

[33]  Zhiyuan Liu,et al.  A Unified Model for Word Sense Representation and Disambiguation , 2014, EMNLP.

[34]  Weiming Shen,et al.  An weighted ontology-based semantic similarity algorithm for web service , 2009, Expert Syst. Appl..

[35]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[36]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[37]  Annalina Caputo,et al.  An Enhanced Lesk Word Sense Disambiguation Algorithm through a Distributional Semantic Model , 2014, COLING.