Maximum entropy framework proved to be expressive and powerful for the statistical language modelling, but it suffers from the computational expensiveness of the model building. The iterative scaling algorithm that is used for the parameter estimation is computationally expensive while the feature selection process requires to estimate parameters of the model for many candidate features many times. In this paper we present a novel approach for building maximum entropy models. Our approach uses a features collocation lattice and selects the atomic features without resorting to iterative scaling. After the atomic features have been selected we, using the iterative scaling, compile a fully saturated model for the maximal constraint space and then start to eliminate the most specific constraints. Since during constraint deselection at every point we have a fully fit maximum entropy model, we rank the constraints on the basis of their weights in the model. Therefore we don't have to use the iterative scaling during constraint ranldng and apply it only for linear model regression. Another important improvement is that since the simplified model deviates from the previous larger model only in a small number of constraints, we use the parameters of the old model as the initial values of the parameters for the iterative scaling of the new one. This proved to decrease the number of required iterations by about tenfold. As practical results we discuss how our method has been applied to several tasks of language modelling such as sentence boundary disambiguation, part-of-speech tagging and automatic document abstracting.
[1]
John D. Lafferty,et al.
Inducing Features of Random Fields
,
1995,
IEEE Trans. Pattern Anal. Mach. Intell..
[2]
Adwait Ratnaparkhi,et al.
A Maximum Entropy Approach to Identifying Sentence Boundaries
,
1997,
ANLP.
[3]
Adam L. Berger,et al.
A Maximum Entropy Approach to Natural Language Processing
,
1996,
CL.
[4]
Beatrice Santorini,et al.
Building a Large Annotated Corpus of English: The Penn Treebank
,
1993,
CL.
[5]
Ted Pedersen,et al.
A New Supervised Learning Algorithm for Word Sense Disambiguation
,
1997,
AAAI/IAAI.
[6]
Marti A. Hearst,et al.
Adaptive Sentence Boundary Disambiguation
,
1994,
ANLP.
[7]
Slava M. Katz,et al.
Estimation of probabilities from sparse data for the language model component of a speech recognizer
,
1987,
IEEE Trans. Acoust. Speech Signal Process..
[8]
Marti A. Hearst,et al.
Adaptive Multilingual Sentence Boundary Disambiguation
,
1997,
CL.