Naive Mixes for Word Sense Disambiguation
暂无分享,去创建一个
The Naive Mix is a new supervised learning algorithm based on sequential model selection. The usual objective of model selection is to nd a single proba-bilistic model that adequately characterizes, i.e. ts, the data in a training sample. The Naive Mix combines models discarded during the selection process with the best{{tting model to form an averaged probabilistic model. This is shown to improve classiication accuracy when applied to the problem of determining the meaning of an ambiguous word in a sentence. A probabilistic model consists of a parametric form and parameter estimates. The form of a model describes the interactions between the features of a sentence with an ambiguous word while the parameter estimates give the probability of observing each possible combination of feature values in a sentence. The class of models in a Naive Mix is restricted to decomposable log{linear models to reduce the model search space and simplify parameter estimation. The form of a decomposable model can be represented by an undirected graph whose nodes represent features and whose edges represent the interactions between features. The parameter estimates are the product of the marginal distributions, i.e. the maximal cliques in the graph of the model. Model selection integrates a search strategy with an evaluation criterion. The search strategy determines which decomposable models are evaluated during the selection process. The evaluation criterion measures the t of each model to the training sample. (Ped-ersen, Bruce, & Wiebe 1997) report that the strategy of forward sequential search (FSS) and evaluation by Akaike's information criteria (AIC) selects models that serve as accurate classiiers for word{sense disam-biguation. Here, this combination is shown to result in Naive Mixes that improve the accuracy of disambigua-tion over single selected models. Model selection guided by FSS evaluates the t of decomposable models at increasing levels of complexity , where complexity is deened as the number of edges in the graph of the model. The best{{tting model of complexity level i is designated the current model, m i. The models evaluated at complexity level i+1 are generated by adding one edge to m i and checking that the resultant model is decomposable. The evaluation begins with the model of independence where there are no interactions between features, i = 0, and ends when none of the generated models of complexity level i + 1 suuciently improves on the t of m i. The result is a sequence of …
[1] Ted Pedersen,et al. A New Supervised Learning Algorithm for Word Sense Disambiguation , 1997, AAAI/IAAI.
[2] Ted Pedersen,et al. Sequential Model Selection for Word Sense Disambiguation , 1997, ANLP.