A Semantic Scattering model for the automatic interpretation of English genitives

An important problem in knowledge discovery from text is the automatic extraction of semantic relations. This paper addresses the automatic classification of the semantic relations expressed by English genitives. A learning model is introduced based on the statistical analysis of the distribution of genitives' semantic relations in a corpus. The semantic and contextual features of the genitive's noun phrase constituents play a key role in the identification of the semantic relation. The algorithm was trained and tested on a corpus of approximately 20,000 sentences and achieved an f-measure of 79.80 per cent for of-genitives, far better than the 40.60 per cent obtained using a Decision Trees algorithm, the 50.55 per cent obtained using a Naive Bayes algorithm, or the 72.13 per cent obtained using a Support Vector Machines algorithm on the same corpus using the same features. The results were similar for s-genitives: 78.45 per cent using Semantic Scattering, 47.00 per cent using Decision Trees, 43.70 per cent using Naive Bayes, and 70.32 per cent using a Support Vector Machines algorithm. The results demonstrate the importance of word sense disambiguation and semantic generalization/specialization for this task. They also demonstrate that different patterns (in our case the two types of genitive constructions) encode different semantic information and should be treated differently in the sense that different models should be built for different patterns.

[1]  John R. Taylor,et al.  Possessives in English : an exploration in cognitive grammar , 1996 .

[2]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[3]  Robert P. Stockwell,et al.  Modern English Structure , 1964 .

[4]  Maria Polinsky,et al.  Possessives in English , 1996 .

[5]  Maria Lapata,et al.  The Disambiguation of Nominalizations , 2002, CL.

[6]  Dan I. Moldovan,et al.  On the semantics of noun compounds , 2005, Comput. Speech Lang..

[7]  Hang Li,et al.  Generalizing Case Frames Using a Thesaurus and the MDL Principle , 1995, CL.

[8]  Christiane Fellbaum,et al.  Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms , 1998 .

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Xavier Carreras,et al.  Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling , 2004, CoNLL.

[11]  Dan I. Moldovan,et al.  Automatic Discovery of Part-Whole Relations , 2006, CL.

[12]  Martha W. Evens,et al.  Lexical-Semantic Relations: A Comparative Survey , 1981 .

[13]  Noam Chomsky Knowledge of Language , 1986 .

[14]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[15]  A. Stuart,et al.  Non-Parametric Statistics for the Behavioral Sciences. , 1957 .

[16]  Dan Moldovan,et al.  Models for the Semantic Classification of Noun Phrases , 2004, HLT-NAACL 2004.

[17]  Chris Brew The Cambridge Grammar of the English Language , 2003 .

[18]  Oier Lopez de Lacalle,et al.  Approximating Hierarchy-Based Similarity for WordNet Nominal Synsets using Topic Signatures , 2004 .

[19]  R. Langacker Reference-point constructions , 1993 .

[20]  Diana McCarthy,et al.  Lexical acquisition at the syntax-semantics interface : diathesis alternations, subcategorization frames and selectional preferences , 2001 .

[21]  KIKI NIKIFORIDOU The meanings of the genitive: A case study in semantic structure and semantic change , 1991 .

[22]  O. Jespersen A modern English grammar on historical principles , 1928 .

[23]  J. R. Quinlan,et al.  Data Mining Tools See5 and C5.0 , 2004 .

[24]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[25]  Dan I. Moldovan,et al.  Lymba's PowerAnswer 4 in TREC 2007 , 2007, TREC.

[26]  Anatol Stefanowitsch,et al.  Constructional semantics as a limit to grammatical alternation: The two genitives of English , 2003 .

[27]  Marc Light,et al.  Statistical models for the induction and use of selectional preferences , 2002, Cogn. Sci..

[28]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[29]  Dan I. Moldovan,et al.  Lexical Chains for Question Answering , 2002, COLING.

[30]  William S. Maki,et al.  Semantic distance norms computed from an electronic dictionary (WordNet) , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[31]  Preslav Nakov,et al.  Classification of semantic relations between nominals , 2009, Lang. Resour. Evaluation.

[32]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[33]  Ken Litkowski,et al.  Senseval-3 task: Automatic labeling of semantic roles , 2004, SENSEVAL@ACL.

[34]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[35]  Dan I. Moldovan,et al.  A Semantic Scattering Model for the Automatic Interpretation of Genitives , 2005, HLT.

[36]  Mark Lauer,et al.  Designing Statistical Language Learners: Experiments on Noun Compounds , 1996, ArXiv.

[37]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[38]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[39]  Per Anker Jensen,et al.  A Semantic Analysis of the English Genitive.: Interaction of Lexical and Formal Semantics , 2002 .

[40]  Ronald W. Langacker,et al.  Concept, Image, and Symbol , 1990 .

[41]  Dan I. Moldovan,et al.  LCC's WSD systems for Senseval-3 , 2004, SENSEVAL@ACL.

[42]  Noam Chomsky Knowledge of language: its nature, origin, and use , 1988 .

[43]  Marti A. Hearst Automated Discovery of WordNet Relations , 2004 .

[44]  Maria Lapata The Disambiguation of Nominalisations , 2002 .

[45]  Dan I. Moldovan,et al.  Classification of semantic relations between nouns , 2004 .

[46]  Barbara H. Partee,et al.  Possessives, favorite, and Coercion , 1999 .

[47]  Barbara Rosario,et al.  The Descent of Hierarchy, and Selection in Relational Semantics , 2002, ACL.

[48]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[49]  B. Altenberg Binominal NP's in a Thematic Perspective: Genitive vs of construction in 17th Century English , 1980 .

[50]  Possession and possessive constructions , 1995 .

[51]  Dan I. Moldovan,et al.  Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations , 2003, NAACL.