Whole sentence maximum entropy models directly model the probability of a sentence using features – arbitrary computable properties of the sentence. We investigate whether linguistic features that capture the underlying linguistic structure of a sentence can improve modeling. We use a shallow parser to parse sentences into linguistic constituents in two corpora; one is the original training corpus, and the other is an artificial corpus generated from an initial trigram model. We define three sets of candidate linguistic features based on these constituents, and compute the prevalence of each feature in the two data sets. We select features with significantly different frequencies. These correspond to phenomena poorly modeled by traditional trigrams, and reveal interesting linguistic deficiencies of the initial model. We found 6798 linguistic features in the Switchboard domain and achieved small improvements in perplexity and speech recognition accuracy with these features.
[1]
Roni Rosenfeld,et al.
A whole sentence maximum entropy language model
,
1997,
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[2]
J. Darroch,et al.
Generalized Iterative Scaling for Log-Linear Models
,
1972
.
[3]
Ronald Rosenfeld,et al.
Efficient sampling and feature selection in whole sentence maximum entropy language models
,
1999,
1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[4]
R. Larsen.
An introduction to mathematical statistics and its applications / Richard J. Larsen, Morris L. Marx
,
1986
.