Semantically Motivated Hebrew Verb-Noun Multi-Word Expressions Identification

Identification of Multi-Word Expressions (MWEs) lies at the heart of many natural language processing applications. In this research, we deal with a particular type of Hebrew MWEs, Verb-Noun MWEs (VN-MWEs), which combine a verb and a noun with or without other words. Most prior work on MWEs classification focused on linguistic and statistical information. In this paper, we claim that it is essential to utilize semantic information. To this end, we propose a semantically motivated indicator for classifying VN-MWE and define features that are related to various semantic spaces and combine them as features in a supervised classification framework. We empirically demonstrate that our semantic feature set yields better performance than the common linguistic and statistical feature sets and that combining semantic features contributes to the VN-MWEs identification task.

[1]  Pavel Pecina,et al.  Combining Association Measures for Collocation Extraction , 2006, ACL.

[2]  Douglas L. T. Rohde,et al.  An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence , 2005 .

[3]  José Gabriel Pereira Lopes,et al.  Language Independent Automatic Acquisition of Rigid Multiword Units from Unrestricted Text Corpora , 1999 .

[4]  Eugenie Giesbrecht,et al.  Automatic Identification of Non-Compositional Multi-Word Expressions using Latent Semantic Analysis , 2006 .

[5]  Paul Deane,et al.  A Nonparametric Method for Extraction of Candidate Phrasal Terms , 2005, ACL.

[6]  Daoud Clarke Context-theoretic Semantics for Natural Language: an Overview , 2009 .

[7]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[8]  Ido Dagan,et al.  Directional distributional similarity for lexical inference , 2010, Natural Language Engineering.

[9]  Shuly Wintner,et al.  Hebrew Verbal Multi-Word Expressions , 2015 .

[10]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[11]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[14]  Christopher D. Manning,et al.  Parsing Models for Identifying Multiword Expressions , 2013, CL.

[15]  Ray Jackendoff,et al.  The Architecture of the Language Faculty , 1996 .

[16]  Shuly Wintner,et al.  A Finite-State Morphological Grammar of Hebrew , 2005, Natural Language Engineering.

[17]  Hassan Al-Haj Hebrew Multiword Expressions: Linguistic Properties, Lexical Representation, Morphological Processing, and Automatic Acquisition , 2009 .

[18]  Afsaneh Fazly,et al.  Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations , 2006, EACL.

[19]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[20]  Magnus Sahlgren,et al.  An Introduction to Random Indexing , 2005 .

[21]  Driss Aboutajdine,et al.  A Multi-Word Term Extraction Program for Arabic Language , 2008, LREC.

[22]  Shuly Wintner,et al.  Identifying Multi-word Expressions by Leveraging Morphological and Syntactic Idiosyncrasy , 2010, COLING.

[23]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[24]  Christian Biemann,et al.  Distributional Semantics and Compositionality 2011: Shared Task Description and Results , 2011 .

[25]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[26]  Alon Itai,et al.  Language resources for Hebrew , 2008, Lang. Resour. Evaluation.

[27]  Emiliano Raúl Guevara,et al.  Computing Semantic Compositionality in Distributional Semantics , 2011, IWCS.

[28]  David J. Weir,et al.  A General Framework for Distributional Similarity , 2003, EMNLP.

[29]  Eduard Bejcek,et al.  Syntactic Identification of Occurrences of Multiword Expressions in Text using a Lexicon with Dependency Structures , 2013, MWE@NAACL-HLT.

[30]  Yaakov HaCohen-Kerner,et al.  A Lexical Resource of Hebrew Verb-Noun Multi-Word Expressions , 2016, LREC.

[31]  Trevor Cohen,et al.  Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections , 2010, J. Biomed. Informatics.

[32]  Pavel Pecina,et al.  Lexical association measures and collocation extraction , 2009, Lang. Resour. Evaluation.

[33]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[34]  Ian Witten,et al.  Data Mining , 2000 .

[35]  Yulia Tsvetkov,et al.  Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources , 2014, Computational Linguistics.

[36]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[37]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[38]  Khalil Sima'an,et al.  Part-of-speech tagging of Modern Hebrew text , 2008, Natural Language Engineering.

[39]  Pierre Zweigenbaum,et al.  Identifying bilingual Multi-Word Expressions for Statistical Machine Translation , 2012, LREC.

[40]  Timothy Baldwin,et al.  Deep lexical acquisition of verb-particle constructions , 2005, Comput. Speech Lang..

[41]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[42]  Joakim Nivre,et al.  Modeling the Statistical Idiosyncrasy of Multiword Expressions , 2015, MWE@NAACL-HLT.

[43]  Caroline Sporleder,et al.  Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions , 2009, EACL.

[44]  Timothy Baldwin,et al.  A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions , 2015, NAACL.

[45]  Yulia Tsvetkov,et al.  Extraction of Multi-word Expressions from Small Parallel Corpora , 2010, COLING.

[46]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.