论文信息 - Distinguishing antonymy, synonymy and hypernymy with distributional and distributed vector representations and neural networks

Distinguishing antonymy, synonymy and hypernymy with distributional and distributed vector representations and neural networks

In the last decade, computational models that distinguish semantic relations have become crucial for many applications in Natural Language Processing (NLP), such as machine translation, question answering, sentiment analysis, and so on. These computational models typically distinguish semantic relations by either representing semantically related words as vector representations in the vector space, or using neural networks to classify semantic relations. In this thesis, we mainly focus on the improvement of such computational models. Specifically, the goal of this thesis is to address the tasks of distinguishing antonymy, synonymy, and hypernymy. For the task of distinguishing antonymy and synonymy, we propose two approaches. In the first approach, we focus on improving both families of word vector representations, which are distributional and distributed vector representations. Regarding the improvement of distributional vector representation, we propose a novel weighted feature for constructing word vectors by relying on distributional lexical contrast, a feature capable of differentiating between antonymy and synonymy. In terms of the improvement of distributed vector representations, we propose a neural model to learn word vectors by integrating distributional lexical contrast into the objective function of the neural model. The resulting word vectors can distinguish antonymy from synonymy and predict degrees of word similarity. In the second approach, we aim to use lexico-syntactic patterns to classify antonymy and synonymy. To do so, we propose two pattern-based neural networks to distinguish antonymy from synonymy. The lexico-syntactic patterns are induced from the syntactic parse trees and then encoded as vector representations by neural networks. As a result, the two pattern-based neural networks improve performance over prior pattern-based methods. For the tasks of distinguishing hypernymy, we propose a novel neural model to learn hierarchical embeddings for hypernymy detection and directionality. The hierarchical embeddings are learned according to two underlying aspects (i) that the similarity of hypernymy is higher than similarity of other relations, and (ii) that the distributional hierarchy is generated between hyponyms and hypernyms. The experimental results show that hierarchical embeddings significantly outperform state-of-the-art word embeddings. In order to improve word embeddings for measuring semantic similarity and relatedness, we propose two neural models to learn word denoising embeddings by filtering noise from original word embeddings without using any external resources. Two proposed neural models receive original word embeddings as inputs and learn denoising matrices to filter noise from original word embeddings. Word denoising embeddings achieve the improvement against original word embeddings over tasks of semantic similarity and relatedness. Furthermore, rather than using English, we also shift the focus on evaluating the performance of computational models to Vietnamese. To that effect, we introduce two novel datasets of (dis-)similarity and relatedness for Vietnamese. We then make use of computational models to verify the two datasets and to observe their performance in being adapted to Vietnamese. The results show that computational models exhibit similar behaviour in the two Vietnamese datasets as in the corresponding English datasets.

Kim Anh Nguyen | K. Nguyen

[1] Phil Blunsom,et al. Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[2] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[3] Angeliki Lazaridou,et al. A Multitask Objective to Inject Lexical Contrast into Distributional Semantics , 2015, ACL.

[4] Alessandro Lenci,et al. How we BLESSed distributional semantic evaluation , 2011, GEMS.

[5] Ming Zhou,et al. Identifying Synonyms among Distributionally Similar Words , 2003, IJCAI.

[6] Graeme Hirst,et al. Computing Lexical Contrast , 2013, CL.

[7] Felix Hill,et al. HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment , 2016, CL.

[8] Ido Dagan,et al. The Distributional Inclusion Hypotheses and Lexical Entailment , 2005, ACL.

[9] J. Deese. The structure of associations in language and thought , 1966 .

[10] Elia Bruni,et al. Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[11] P. Jaccard. THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[12] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[13] Bernice W. Polemis. Nonparametric Statistics for the Behavioral Sciences , 1959 .

[14] Martha Palmer,et al. Verb Semantics and Lexical Selection , 1994, ACL.

[15] Ido Dagan,et al. Improving Hypernymy Detection with an Integrated Path-based and Distributional Method , 2016, ACL.

[16] Stephen Clark,et al. Exploiting Image Generality for Lexical Entailment Detection , 2015, ACL.

[17] Laura Rimell,et al. Distributional Lexical Entailment by Topic Coherence , 2014, EACL.

[18] Wenlin Chen,et al. Strategies for Training Large Vocabulary Neural Language Models , 2015, ACL.

[19] Dekang Lin,et al. Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[20] Chu-Ren Huang,et al. Taking Antonymy Mask off in Vector Space , 2014, PACLIC.

[21] Klaus Krippendorff,et al. Content Analysis: An Introduction to Its Methodology , 1980 .

[22] T. Landauer,et al. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[23] Graeme Hirst,et al. Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[24] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[25] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[26] Sabine Schulte im Walde,et al. Uncovering Distributional Differences between Synonyms and Antonyms in a Word Space Model , 2013, IJCNLP.

[27] Chris Callison-Burch,et al. Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases , 2009, EMNLP.

[28] Iryna Gurevych,et al. Thinking beyond the nouns - computing semantic relatedness across parts of speech , 2006 .

[29] Sabine Schulte im Walde,et al. Pattern-Based Distinction of Paradigmatic Relations for German Nouns, Verbs, Adjectives , 2013, GSCL.

[30] Gemma Boleda,et al. Inclusive yet Selective: Supervised Distributional Hypernymy Detection , 2014, COLING.

[31] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[32] Ido Dagan,et al. Articles: Bootstrapping Distributional Feature Vector Quality , 2009, CL.

[33] Roy Schwartz,et al. Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction , 2015, CoNLL.

[34] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[35] Stefan Müller,et al. Exploring Vector Space Models to Predict the Compositionality of German Noun-Noun Compounds , 2013, *SEMEVAL.

[36] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[37] Yann LeCun,et al. Structured sparse coding via lateral inhibition , 2011, NIPS.

[38] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.

[39] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[40] J. Firth. Papers in linguistics , 1958 .

[41] Sanja Fidler,et al. Order-Embeddings of Images and Language , 2015, ICLR.

[42] Roberto Navigli,et al. Word sense disambiguation: A survey , 2009, CSUR.

[43] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[44] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[45] D. Gentner. Why verbs are hard to learn , 2006 .

[46] Katrin Erk,et al. Flexible, Corpus-Based Modelling of Human Plausibility Judgements , 2007, EMNLP.

[47] Ngoc Thang Vu,et al. Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction , 2016, ACL.

[48] Mirella Lapata,et al. Dependency-Based Construction of Semantic Space Models , 2007, CL.

[49] Mathias Rossignol,et al. A lexicon for Vietnamese language processing , 2007, Lang. Resour. Evaluation.

[50] Angeliki Lazaridou,et al. Fish Transporters and Miracle Homes: How Compositional Distributional Semantics can Help NP Parsing , 2013, EMNLP.

[51] Van-Lam Pham,et al. A Two-Phase Approach for Building Vietnamese WordNet , 2016, GWC.

[52] Omer Levy,et al. Do Supervised Distributional Methods Really Learn Lexical Inference Relations? , 2015, NAACL.

[53] Graeme Hirst,et al. Non-Classical Lexical Semantic Relations , 2004, Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics - CLS '04.

[54] Raffaella Bernardi,et al. Entailment above the word level in distributional semantics , 2012, EACL.

[55] David J. Weir,et al. Characterising Measures of Lexical Distributional Similarity , 2004, COLING.

[56] E. Clark. Conventionality and contrast: Pragmatic principles with lexical consequences. , 1992 .

[57] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[58] Roland Schäfer,et al. Processing and querying large web corpora with the COW14 architecture , 2015 .

[59] Gemma Boleda,et al. Distributional Semantics in Technicolor , 2012, ACL.

[60] David J. Weir,et al. Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.

[61] Felix Hill,et al. SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[62] Michael Roth,et al. Combining Word Patterns and Discourse Markers for Paradigmatic Relation Classification , 2014, ACL.

[63] Ngoc Thang Vu,et al. Hierarchical Embeddings for Hypernymy Detection and Directionality , 2017, EMNLP.

[64] Yang Zhang,et al. Exploring Distributional Similarity Based Models for Query Spelling Correction , 2006, ACL.

[65] Peter D. Turney. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[66] Anh-Cuong Le,et al. A hybrid approach to Vietnamese word segmentation , 2016, 2016 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF).

[67] Ngoc Thang Vu,et al. Neural-based Noise Filtering from Word Embeddings , 2016, COLING.

[68] G. Miller,et al. Contexts of antonymous adjectives , 1989, Applied Psycholinguistics.

[69] Iryna Gurevych,et al. Using the Structure of a Conceptual Network in Computing Semantic Relatedness , 2005, IJCNLP.

[70] Chu-Ren Huang,et al. EVALution 1.0: an Evolving Semantic Dataset for Training and Evaluation of Distributional Semantic Models , 2015, LDL@IJCNLP.

[71] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[72] Roland Schäfer,et al. Building Large Corpora from the Web Using a New Efficient Tool Chain , 2012, LREC.

[73] Patrick Pantel,et al. DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[74] Kathleen McKeown,et al. Classifying Taxonomic Relations between Pairs of Wikipedia Articles , 2013, IJCNLP.

[75] G. Miller,et al. Semantic networks of english , 1991, Cognition.

[76] Omer Levy,et al. Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[77] Stephen Clark,et al. A Systematic Study of Semantic Vector Space Model Parameters , 2014, CVSC@EACL.

[78] Haixun Wang,et al. Learning Term Embeddings for Hypernymy Identification , 2015, IJCAI.