Automatic identification of semantic relations in Italian complex nominals

This paper addresses the problem of the identification of the semantic relations in Italian complex nominals (CNs) of the type N+P+N. We exploit the fact that the semantic relation, which is underspecified in most cases, is partially made explicit by the preposition. We develop an annotation framework around five different semantic relations, which we use to create a corpus of 1700 Italian CNs, obtaining an inter-annotator agreement of K=.695. Exploiting this data, for each preposition p we train a classifier to assign one of the five semantic relations to any CN of the type N+p+N, by using both string and supersense features. To obtain supersenses, we experiment with a sequential tagger as well as a plain lookup in MultiWordNet, and find that using information obtained from the former yields better results.

[1]  Michael J. G. Johnston,et al.  Cross-Linguistic Semantics for Complex Nominals in the Generative Lexicon , 1996 .

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Roxana Girju,et al.  Improving the Interpretation of Noun Phrases with Cross-linguistic Information , 2007, ACL.

[4]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[6]  Yasemin Altun,et al.  Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger , 2006, EMNLP.

[7]  Michael Johnston,et al.  Qualia Structure and the Compositional Interpretation of Compounds , 1999 .

[8]  Beatrice Warren,et al.  Semantic patterns of noun-noun compounds , 1978 .

[9]  Emanuele Pianta,et al.  Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus , 2005, Natural Language Engineering.

[10]  Ian Witten,et al.  Data Mining , 2000 .

[11]  Mark Lauer,et al.  Corpus Statistics Meet the Noun Compound: Some Empirical Results , 1995, ACL.

[12]  Maria Lapata The Disambiguation of Nominalisations , 2002 .

[13]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[14]  Peter D. Turney Expressing Implicit Semantic Relations without Supervision , 2006, ACL.

[15]  Marco Baroni,et al.  Morph-it! A free corpus-based morphological resource for the Italian language , 2005 .

[16]  Judith N. Levi,et al.  The syntax and semantics of complex nominals , 1978 .

[17]  R. Langacker Foundations of cognitive grammar , 1983 .

[18]  Preslav Nakov,et al.  Solving Relational Similarity Problems Using the Web as a Corpus , 2008, ACL.

[19]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[20]  Philipp Cimiano,et al.  Automatically Learning Qualia Structures from the Web , 2005, ACL 2005.

[21]  Barbara Rosario,et al.  Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy , 2001, EMNLP.

[22]  Maria Lapata,et al.  The Disambiguation of Nominalizations , 2002, CL.

[23]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[24]  Dan I. Moldovan,et al.  A Semantic Scattering Model for the Automatic Interpretation of Genitives , 2005, HLT.

[25]  Davide Picca,et al.  Supersense Tagger for Italian , 2008, LREC.

[26]  Nicola Zingarelli,et al.  Vocabolario della lingua italiana , 1971 .

[27]  Preslav Nakov,et al.  SemEval-2007 Task 04: Classification of Semantic Relations between Nominals , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[28]  Pamela A. Downing On the Creation and Use of English Compound Nouns. , 1977 .