Capturing Discriminative Attributes Using Convolution Neural Network Over ConceptNet Numberbatch Embedding

A semantic representation of text helps us understand the lexical association between words. Capturing these associations becomes an integral part of perceiving any language. One such fundamental property that expresses these associations is ‘similarity.’ This property shows consistency in comprehending similar words in the same manner. However, seldom it falls short on specific tasks where the lexical similarity in itself is not sufficient enough to validate the semantic representations. In this paper, the objective is to capture such semantic distinctions. It is based on the shared task ‘capturing discriminative attributes’ conducted in SemEval-2018. Our team participated in the task and held an F1 score of 0.658 with GloVe representation. An extension to this work is taken up in this paper where a new embedding known as ConceptNet Numberbatch is explored. The ConceptNet word embedding in overall showed improvement in the scores with the previous rule-based feature representation. The model is further tuned over certain hyperparameters, which improved the score to as much as 6%. A comparison is also put forth here with another prominent embedding like FastText. The ConceptNet model achieved a near par score with the state of the art, based only on a simple ensemble of features as representation.

[1]  Denis Paperno,et al.  Capturing Discriminative Attributes in a Distributional Space: Task Proposal , 2016, RepEval@ACL.

[2]  Catherine Havasi,et al.  ConceptNet 5: A Large Semantic Network for Relational Knowledge , 2013, The People's Web Meets NLP.

[3]  K. P. Soman,et al.  Tamil word sense disambiguation using support vector machines with rich features , 2014 .

[4]  Prabaharan Poornachandran,et al.  Scalable Framework for Cyber Threat Situational Awareness Based on Domain Name Systems Data Analysis , 2018 .

[5]  Alessandro Lenci,et al.  How we BLESSed distributional semantic evaluation , 2011, GEMS.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[8]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[9]  P SomanK.,et al.  AmritaNLP at SemEval-2018 Task 10: Capturing discriminative attributes using convolution neural network over global vector representation , 2018, SemEval@NAACL-HLT.

[10]  Angeliki Lazaridou,et al.  The red one!: On learning to refer to things based on discriminative properties , 2016, ACL.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[13]  Robyn Speer,et al.  ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge , 2017, *SEMEVAL.

[14]  Marco Baroni,et al.  The red one!: On learning to refer to things based on their discriminative properties , 2016 .

[15]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[16]  Mark S. Seidenberg,et al.  Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.

[17]  Alessandro Lenci,et al.  SemEval-2018 Task 10: Capturing Discriminative Attributes , 2018, *SEMEVAL.