Interpreting compound nouns with kernel methods

This paper presents a classification-based approach to noun-noun compound interpretation within the statistical learning framework of kernel methods. In this framework, the primary modelling task is to define measures of similarity between data items, formalised as kernel functions. We consider the different sources of information that are useful for understanding compounds and proceed to define kernels that compute similarity between compounds in terms of these sources. In particular, these kernels implement intuitive notions of lexical and relational similarity and can be computed using distributional information extracted from text corpora. We report performance on classification experiments with three semantic relation inventories at different levels of granularity, demonstrating in each case that combining lexical and relational information sources is beneficial and gives better performance than either source taken alone. The data used in our experiments are taken from general English text, but our methods are also applicable to other domains and potentially to other languages where noun-noun compounding is frequent and productive.

[1]  Lara L. Jones,et al.  Priming via relational similarity: A COPPER HORSE is faster when seen through a GLASS EYE , 2006 .

[2]  Oliver Geoffrey Davidson,et al.  The interpretation of noun noun compounds , 1996 .

[3]  Dekang Lin,et al.  Automatic Identification of Non-compositional Phrases , 1999, ACL.

[4]  Eric P. Xing,et al.  Nonextensive Information Theoretic Kernels on Measures , 2009, J. Mach. Learn. Res..

[5]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[6]  Nello Cristianini,et al.  Composite Kernels for Hypertext Categorisation , 2001, ICML.

[7]  Christina L. Gagné,et al.  Lexical and Relational Influences on the Processing of Novel Compounds , 2002, Brain and Language.

[8]  Dan I. Moldovan,et al.  On the semantics of noun compounds , 2005, Comput. Speech Lang..

[9]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[10]  Mehryar Mohri,et al.  Two-Stage Learning Kernel Algorithms , 2010, ICML.

[11]  Preslav Nakov Noun Compound Interpretation Using Paraphrasing Verbs: Feasibility Study , 2008, AIMSA.

[12]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[13]  Fintan J. Costello,et al.  Learning to Interpret Novel Noun-Noun Compounds: Evidence from Category Learning Experiments , 2007, Cognitive Aspects of Computational Language Acquisition.

[14]  Stanley Y. W. Su A Semantic Theory Based Upon Interactive Meaning , 1969 .

[15]  Stan Szpakowicz,et al.  Learning Noun-Modifier Semantic Relations with Corpus-based and WordNet-based Features , 2006, AAAI.

[16]  Christian S. Jensen,et al.  On the Semantics of , 1996 .

[17]  Eduard H. Hovy,et al.  A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation , 2010, ACL.

[18]  Timothy Baldwin,et al.  Translation by Machine of Complex Nominals: Getting it Right , 2004 .

[19]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[20]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[21]  Timothy Baldwin,et al.  Automatic Interpretation of Noun Compounds Using WordNet Similarity , 2005, IJCNLP.

[22]  Sylvia W Russell semantic categories of nominals for conceptual dependency analysis of natural language. , 1972 .

[23]  Hal Daumé,et al.  Generative Kernels for Exponential Families , 2011, AISTATS.

[24]  Christina L. Gagné,et al.  Influence of Thematic Relations on the Comprehension of Modifier–noun Combinations , 1997 .

[25]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[26]  Anna Korhonen,et al.  Probabilistic models of similarity in syntactic context , 2011, EMNLP.

[27]  Fintan J. Costello,et al.  Investigating the Relations used in Conceptual Combination , 2005, Artificial Intelligence Review.

[28]  Peter D. Turney A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations , 2008, COLING.

[29]  Preslav Nakov,et al.  SemEval-2007 Task 04: Classification of Semantic Relations between Nominals , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[30]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[31]  Jonathan K. Kummerfeld,et al.  Large-Scale Syntactic Processing : Parsing the Web Final Report of the 2009 JHU CLSP Workshop , 2009 .

[32]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[33]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[34]  Diarmuid Ó Séaghdha,et al.  Semantic Classification with Distributional Kernels , 2008, COLING.

[35]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[36]  Preslav Nakov,et al.  Solving Relational Similarity Problems Using the Web as a Corpus , 2008, ACL.

[37]  Preslav Nakov,et al.  SemEval-2010 Task 9: The Interpretation of Noun Compounds Using Paraphrasing Verbs and Prepositions , 2010, SemEval@ACL.

[38]  Ann Copestake,et al.  Co-occurrence Contexts for Noun Compound Interpretation , 2007 .

[39]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[40]  M. Ryder Ordered Chaos: The Interpretation of English Noun-Noun Compounds , 1994 .

[41]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[42]  Matthias Hein,et al.  Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[43]  Diarmuid Ó Séaghdha,et al.  Using Lexical and Relational Similarity to Classify Semantic Relations , 2009, EACL.

[44]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[45]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[46]  Diarmuid Ó Séaghdha Learning compound noun semantics , 2008 .

[47]  John D. Lafferty,et al.  Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..