Extending WordNet with Fine-Grained Collocational Information via Supervised Distributional Learning

WordNet is probably the best known lexical resource in Natural Language Processing. While it is widely regarded as a high quality repository of concepts and semantic relations, updating and extending it manually is costly. One important type of relation which could potentially add enormous value to WordNet is the inclusion of collocational information, which is paramount in tasks such as Machine Translation, Natural Language Generation and Second Language Learning. In this paper, we present ColWordNet (CWN), an extended WordNet version with fine-grained collocational information, automatically introduced thanks to a method exploiting linear relations between analogous sense-level embeddings spaces. We perform both intrinsic and extrinsic evaluations, and release CWN for the use and scrutiny of the community.

[1]  Gabriela Ferraro,et al.  Towards Distributional Semantics-Based Classification of Collocations for Collocation Dictionaries , 2016 .

[2]  Stefan Evert,et al.  Corpora and collocations , 2007 .

[3]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[4]  Ngoc Thang Vu,et al.  Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction , 2016, ACL.

[5]  Roberto Navigli,et al.  Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity , 2013, ACL.

[6]  Haixun Wang,et al.  Learning Term Embeddings for Hypernymy Identification , 2015, IJCAI.

[7]  Roberto Navigli,et al.  Knowledge Base Unification via Sense Embeddings and Disambiguation , 2015, EMNLP.

[8]  Gerlof Bouma Collocation Extraction beyond the Independence Assumption , 2010, ACL.

[9]  Mohammad Taher Pilehvar,et al.  SemEval-2016 Task 14: Semantic Taxonomy Enrichment , 2016, *SEMEVAL.

[10]  Gabriela Ferraro,et al.  Can we determine the semantics of collocations without using semantics , 2013 .

[11]  Leo Wanner,et al.  Making sense of collocations , 2006, Comput. Speech Lang..

[12]  Eugene Agichtein,et al.  Finding the right facts in the crowd: factoid question answering over social media , 2008, WWW.

[13]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[14]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[15]  Pavel Pecina AMachine Learning Approach to Multiword Expression Extraction , 2008 .

[16]  Hui Fang,et al.  A Re-examination of Query Expansion Using Lexical Resources , 2008, ACL.

[17]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[18]  Zhao-Ming Gao Automatic Identification of English Collocation Errors Based on Dependency Relations , 2013, PACLIC.

[19]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[20]  Mohammad Taher Pilehvar,et al.  Reserating the awesometastic: An automatic extension of the WordNet taxonomy for novel terms , 2015, HLT-NAACL.

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22]  Wanxiang Che,et al.  Learning Semantic Hierarchies via Word Embeddings , 2014, ACL.

[23]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[24]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[25]  Alexander F. Gelbukh,et al.  Semantic Analysis of Verbal Collocations with Lexical Functions , 2013, Studies in Computational Intelligence.

[26]  Ignacio Iacobacci,et al.  Embedding Words and Senses Together via Joint Knowledge-Enhanced Training , 2016, CoNLL.

[27]  Zhiyuan Liu,et al.  A Unified Model for Word Sense Representation and Disambiguation , 2014, EMNLP.

[28]  Margarita Alonso Ramos,et al.  Enriching the Spanish EuroWordNet by Collocations , 2004, LREC.

[29]  Roberto Carlini,et al.  Example-based Acquisition of Fine-grained Collocation Resources , 2016, LREC.

[30]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[31]  Adam Kilgarriff,et al.  Collocationality (and how to measure it) , 2006 .

[32]  Horacio Saggion,et al.  Supervised Distributional Hypernym Discovery via Domain Adaptation , 2016, EMNLP.

[33]  Angeliki Lazaridou,et al.  A Multitask Objective to Inject Lexical Contrast into Distributional Semantics , 2015, ACL.

[34]  Igor Mel’čuk,et al.  Lexical functions: a tool for the description of lexical relations in a lexicon , 1996 .

[35]  Paul Buitelaar,et al.  SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval) , 2015, SemEval@NAACL-HLT.

[36]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[37]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[38]  Yaacov Choueka,et al.  Looking for Needles in a Haystack or Locating Interesting Collocational Expressions in Large Textual Databases , 1988, RIAO Conference.

[39]  Egoitz Laparra,et al.  Predicate Matrix: automatically extending the semantic interoperability between predicate resources , 2016, Lang. Resour. Evaluation.

[40]  Jonathan Weese,et al.  UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems , 2013, *SEMEVAL.

[41]  Charles L. A. Clarke,et al.  Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings , 2015, ACL.

[42]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.