Word-Sense Disambiguation Using Decomposable Models

Most probabilistic classifiers used for word-sense disambiguation have either been based on only one contextual feature or have used a model that is simply assumed to characterize the interdependencies among multiple contextual features. In this paper, a different approach to formulating a probabilistic model is presented along with a case study of the performance of models produced in this manner for the disambiguation of the noun interest. We describe a method for formulating probabilistic models that use multiple contextual features for word-sense disambiguation, without requiring untested assumptions regarding the form of the model. Using this approach, the joint distribution of all variables is described by only the most systematic variable interactions, thereby limiting the number of parameters to be estimated, supporting computational efficiency, and providing an understanding of the data.

[1]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[2]  Edward F. Kelly,et al.  Computer recognition of English word senses , 1975 .

[3]  T. Speed,et al.  Markov Fields and Log-Linear Interaction Models for Contingency Tables , 1980 .

[4]  N. Wermuth,et al.  Graphical and recursive models for contingency tables , 1983 .

[5]  T. Speed,et al.  Recursive causal models , 1984, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.

[6]  T. Havránek A Procedure for Model Search in Multidimensional Contingency Tables , 1984 .

[7]  K. Koehler Goodness-of-fit tests for log-linear models in sparse contingency tables , 1986 .

[8]  Ezra Black,et al.  An Experiment in Computational Discrimination of English Word Senses , 1988, IBM J. Res. Dev..

[9]  M. E. Johnson,et al.  Estimating model discrepancy , 1990 .

[10]  J. Jorgensen The psychological reality of word senses , 1990 .

[11]  Brian M. Slator,et al.  Providing machine tractable dictionary tools , 1990 .

[12]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[13]  Alon Itai,et al.  Two Languages Are More Informative Than One , 1991, ACL.

[14]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[15]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[16]  David Yarowsky,et al.  Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs , 1992, ACL.

[17]  Hinrich Schütze,et al.  Word Space , 1992, NIPS.

[18]  M. Pagano,et al.  Methods for Exact Goodness-of-Fit Tests , 1992 .

[19]  Louise Guthrie,et al.  Lexical Disambiguation using Simulated Annealing , 1992, COLING.

[20]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[21]  Adam Kilgarriff,et al.  Dictionary word sense distinctions: An enquiry into their nature , 1992, Comput. Humanit..

[22]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[23]  Ronald Rosenfeld,et al.  Adaptive Language Modeling Using the Maximum Entropy Principle , 1993, HLT.

[24]  Ronald Rosenfeld,et al.  Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[26]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[27]  Ronald Rosenfeld A Hybrid Approach to Adaptive Statistical Language Modeling , 1994, HLT.

[28]  Della Summers,et al.  Longman Dictionary of Contemporary English , 1995 .

[29]  Xabier Arregi,et al.  Towards Noun Homonym Disambiguation Using Local Context in Large Text Corpora , .