Class-Based Probability Estimation Using a Semantic Hierarchy

This article concerns the estimation of a particular kind of probability, namely, the probability of a noun sense appearing as a particular argument of a predicate. In order to overcome the accompanying sparse-data problem, the proposal here is to define the probabilities in terms of senses from a semantic hierarchy and exploit the fact that the senses can be grouped into classes consisting of semantically similar senses. There is a particular focus on the problem of how to determine a suitable class for a given sense, or, alternatively, how to determine a suitable level of generalization in the hierarchy. A procedure is developed that uses a chi-square test to determine a suitable level of generalization. In order to test the performance of the estimation method, a pseudo-disambiguation task is used, together with two alternative estimation methods. Each method uses a different generalization procedure; the first alternative uses the minimum description length principle, and the second uses Resnik's measure of selectional preference. In addition, the performance of our method is investigated using both the standard Pearson chi-square statistic and the log-likelihood chi-square statistic.

[1]  Hang Li,et al.  Clustering Words with the MDL Principle , 1996, COLING.

[2]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[3]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[4]  Stephen Clark,et al.  Class-based probability estimation using a semantic hierarchy , 2001, HTL 2001.

[5]  Marc Light,et al.  Hiding a Semantic Hierarchy in a Markov Model , 1999, ACL 1999.

[6]  A. Agresti An introduction to categorical data analysis , 1997 .

[7]  Timothy R. C. Read,et al.  Multinomial goodness-of-fit tests , 1984 .

[8]  Christiane Fellbaum,et al.  Wordnet and Class-Based Probabilities , 1998 .

[9]  P. Resnik Selection and information: a class-based approach to lexical relationships , 1993 .

[10]  Hang Li,et al.  Generalizing Case Frames Using a Thesaurus and the MDL Principle , 1995, CL.

[11]  Eneko Agirre,et al.  Learning class-to-class selectional preferences , 2001, CoNLL.

[12]  Mats Rooth,et al.  Inducing a Semantically Annotated Lexicon via EM-Based Clustering , 1999, ACL.

[13]  Ted Pedersen,et al.  A Decision Tree of Bigrams is an Accurate Predictor of Word Sense , 2001, NAACL.

[14]  Andreas Wagner,et al.  Enriching a lexical semantic net with selectional preferences by means of statistical corpus analysis , 2000, ECAI Workshop on Ontology Learning.

[15]  Stephen Clark,et al.  An Iterative Approach to Estimating Frequencies over a Semantic Hierarchy , 1999, EMNLP.

[16]  Diana McCarthy Word Sense Disambiguation for Acquisition of Selectional Preferences , 1997 .

[17]  Massimiliano Ciaramita,et al.  Explaining away ambiguity: Learning verb selectional preference with Bayesian networks , 2000, COLING.

[18]  Daniel M. Bikel A Statistical Model for Parsing and Word-Sense Disambiguation , 2000, EMNLP.

[19]  Christiane Fellbaum,et al.  Nouns in WordNet , 1998 .

[20]  Francesc Ribas,et al.  On Learning more Appropriate Selectional Restrictions , 1995, EACL.

[21]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[22]  Stephen Clark,et al.  A Class-based Probabilistic approach to Structural Disambiguation , 2000, COLING.

[23]  Philip Resnik WordNet and class-based probabilities , 1998 .

[24]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[25]  Diana McCarthy,et al.  Using Semantic Preferences to Identify Verbal Participation in Role Switching Alternations , 2000, ANLP.

[26]  Ted Pedersen,et al.  Fishing for Exactness , 1996, ArXiv.

[27]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.