Mapping Lexical Entries in a Verbs Database to WordNet Senses

This paper describes automatic techniques for mapping 9611 entries in a database of English verbs to WordNet senses. The verbs were initially grouped into 491 classes based on syntactic features. Mapping these verbs into WordNet senses provides a resource that supports disambiguation in multilingual applications such as machine translation and cross-language information retrieval. Our techniques make use of (1) a training set of 1791 disambiguated entries, representing 1442 verb entries from 167 classes; (2) word sense probabilities, from frequency counts in a tagged corpus; (3) semantic similarity of WordNet senses for verbs within the same class; (4) probabilistic correlations between WordNet data and attributes of the verb classes. The best results achieved 72% precision and 58% recall, versus a lower bound of 62% precision and 38% recall for assigning the most frequently occurring WordNet sense, and an upper bound of 87% precision and 75% recall for human judgment.

[1]  Eduard Hovy,et al.  Comparing Sets of Semantic Relations in Ontologies , 2002 .

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Hans van Halteren,et al.  Improving Data Driven Wordclass Tagging by System Combination , 1998, ACL.

[4]  Philip Resnik,et al.  Disambiguating Noun Groupings with Respect to Wordnet Senses , 1995, VLC@ACL.

[5]  Martha Palmer,et al.  Consistent Criteria for Sense Distinctions , 2000, Comput. Humanit..

[6]  Bonnie J. Dorr,et al.  Spanish EuroWordNet and LCS-based interlingual MT , 1997, MTSUMMIT.

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  Bonnie J. Dorr,et al.  Using WordNet to posit hierarchical structure in Levin’s verb classes , 1997, MTSUMMIT.

[9]  G. Miller,et al.  Semantic networks of english , 1991, Cognition.

[10]  Olivier Bodenreider,et al.  Relationships among Knowledge Structures: Vocabulary Integration within a Subject Domain , 2001 .

[11]  Srinivas Bangalore,et al.  Corpus-Based Lexical Choice in Natural Language Generation , 2000, ACL.

[12]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[13]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[14]  Adwait Ratnaparkhi,et al.  Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[15]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[16]  K. Krippendorff Krippendorff, Klaus, Content Analysis: An Introduction to its Methodology . Beverly Hills, CA: Sage, 1980. , 1980 .

[17]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[18]  Bonnie J. Dorr,et al.  Deriving Verbal and Compositonal Lexical Aspect for NLP Applications , 1997, ACL.

[19]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..