A New Supervised Learning Algorithm for Word Sense Disambiguation

The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to find a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of models is generated that consists of the best-fitting model at each level of model complexity. The Naive Mix utilizes this sequence of models to define a probabilistic model which is then used as a probabilistic classifier to perform word-sense disambiguation. The models in this sequence are restricted to the class of decomposable log-linear models. This class of models offers a number of computational advantages. Experiments disambiguating twelve different words show that a Naive Mix formulated with a forward sequential search and Akaike's Information Criteria rivals established supervised learning algorithms such as decision trees (C4.5), rule induction (CN2) and nearest-neighbor classification (PEBLS).

[1]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[2]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[3]  Mehmet Kayaalp,et al.  Signiicant Lexical Relationships , 1996 .

[4]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[5]  Ellen M. Voorhees,et al.  Corpus-Based Statistical Sense Resolution , 1993, HLT.

[6]  Janyce Wiebe,et al.  Word-Sense Disambiguation Using Decomposable Models , 1994, ACL.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Ted Pedersen,et al.  Sequential Model Selection for Word Sense Disambiguation , 1997, ANLP.

[9]  P. Holland,et al.  Discrete Multivariate Analysis. , 1976 .

[10]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[11]  T. Speed,et al.  Markov Fields and Log-Linear Interaction Models for Contingency Tables , 1980 .

[12]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[13]  Pedro M. Domingos Unifying Instance-Based and Rule-Based Induction , 1996, Machine Learning.

[14]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[15]  Ted Pedersen,et al.  The Measure of a Model , 1996, EMNLP.

[16]  G. Āllport The Psycho-Biology of Language. , 1936 .

[17]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[18]  H. Akaike A new look at the statistical model identification , 1974 .

[19]  Ted Pedersen,et al.  Significant Lexical Relationships , 1996, AAAI/IAAI, Vol. 1.