A Faster Iterative Scaling Algorithm for Conditional Exponential Model

Conditional exponential model has been one of the widely used conditional models in machine learning field and improved iterative scaling (IIS) has been one of the major algorithms for finding the optimal parameters for the conditional exponential model. In this paper, we proposed a faster iterative algorithm named FIS that is able to find the optimal parameters faster than the IIS algorithm. The theoretical analysis shows that the proposed algorithm yields a tighter bound than the traditional IIS algorithm. Empirical studies on the text classification over three different datasets showed that the new iterative scaling algorithm converges substantially faster than both the IIS algorithm and the conjugate gradient algorithm (CG). Furthermore, we examine the quality of the optimal parameters found by each learning algorithm in the case of incomplete training. Experiments have shown that, when only a limited amount of computation is allowed (e.g. no convergence is achieved), the new algorithm FIS is able to obtain lower testing errors than both the IIS method and the CG method.

[1]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[2]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[3]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[4]  M. Møller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1990 .

[5]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[6]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[7]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[8]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[9]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[10]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[11]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[12]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[13]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[14]  Adam Berger,et al.  The Improved Iterative Scaling Algorithm A Gentle Introduction , 2003 .

[15]  P. Jana,et al.  MAXIMUM-ENTROPY APPROACH , 2003 .

[16]  Thomas P. Minka,et al.  Algorithms for maximum-likelihood logistic regression , 2003 .

[17]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.