It is common practice in computational linguistics to attempt to use selectional constraints and semantic type hierarchies as primary knowledge resources to perform word sense disambiguation (cf. Jurafsky and Martin 2000). The most widely adopted methodology is to start from a given ontology of types (e.g. Wordnet, cf. Miller and Fellbaum 2007) and try to use its implied conceptual categories to specify the combinatorial constraints on lexical items. Semantic Typing information about selectional preferences is then used to guide the induction of senses for both nouns and verbs in texts. Practical results have shown, however, that there are a number of problems with such an approach. For instance, as corpus-driven pattern analysis shows (cf. Hanks et al. 2007), the paradigmatic sets of words that populate specific argument slots within the same verb sense do not map neatly onto conceptual categories, as they often include words belonging to different types. Also, the internal composition of these sets changes from verb to verb, so that no stable generalization seems possible as to which lexemes belong to which semantic type (cf. Hanks and Jezek 2008). In this paper, we claim that these are not accidental facts related to the contingencies of a given ontology, but rather the result of an attempt to map distributional language behaviour onto semantic type systems that are not sufficiently grounded in real corpus data. We report the efforts done within the CPA project (cf. Hanks 2009) to build an ontology which satisfies such requirements and explore its advantages in terms of empirical validity over more speculative ontologies.
[1]
James Pustejovsky,et al.
A Pattern Dictionary for Natural Language Processing
,
2005
.
[2]
Ted Briscoe,et al.
Semi-productive Polysemy and Sense Extension
,
1995,
J. Semant..
[3]
Patrick Hanks.
Corpus pattern analysis
,
2004
.
[4]
lunchn sandwichn,et al.
Detecting selectional behavior of complex types in text
,
2007
.
[5]
Christiane Fellbaum,et al.
WordNet then and now
,
2007,
Lang. Resour. Evaluation.
[6]
Hinrich Schütze,et al.
Book Reviews: Foundations of Statistical Natural Language Processing
,
1999,
CL.
[7]
James Pustejovsky,et al.
Automated Induction of Sense in Context
,
2004,
COLING.
[8]
James Pustejovsky,et al.
Constructing a Corpus-based Ontology Using Model Bias
,
2006,
FLAIRS.
[9]
James Pustejovsky,et al.
Semantic Coercion in Language: Beyond Distributional Analysis
,
2012
.
[10]
James Pustejovsky,et al.
Towards a Generative Lexical Resource: The Brandeis Semantic Ontology
,
2006,
LREC.
[11]
Elisabetta Jezek,et al.
When GL meets the corpus: a data-driven investigation of semantic types and coercion phenomena
,
2007
.
[12]
Adam Kilgarriff,et al.
The Sketch Engine
,
2004
.
[13]
James H. Martin,et al.
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
,
2000
.
[14]
James Pustejovsky,et al.
The Generative Lexicon
,
1995,
CL.