论文信息 - Improved Iterative Scaling Can Yield Multiple Globally Optimal Models with Radically Differing Performance Levels

Improved Iterative Scaling Can Yield Multiple Globally Optimal Models with Radically Differing Performance Levels

Log-linear models can be efficiently estimated using algorithms such as Improved Iterative Scaling (IIS) (Lafferty et al., 1997). Under certain conditions and for a particular class of problems, IIS is guaranteed to approach both the maximum-likelihood and maximum entropy solution. This solution, in likelihood space, is unique. Unfortunately, in realistic situations, multiple solutions may exist, all of which are equivalent to each other in terms of likelihood, but radically different from each other in terms of performance. We show that this behaviour can occur when a model contains overlapping features and the training material is sparse. Experimental results, from the domain of parse selection for stochastic attribute value grammars, shows the wide variation in performance that can be found when estimating models using IIS. Further results show that the influence of the initial model can be diminished by selecting either uniform weights, or else by model averaging.

Miles Osborne | Iain Bancarz

[1] Miles Osborne,et al. Estimation of Stochastic Attribute-Value Grammars using an Informative Sample , 2000, COLING.

[2] Ted Briscoe,et al. Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[3] Rob Malouf,et al. A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[4] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[5] Mark Johnson,et al. Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[6] Miles Osborne,et al. Overfitting Avoidance for Stochastic Modeling of Attribute-Value Grammars , 2000, CoNLL/LLL.

[7] Steven P. Abney. Stochastic Attribute-Value Grammars , 1996, CL.

[8] Andrew McCallum,et al. Using Maximum Entropy for Text Classification , 1999 .