Using Vocabulary Knowledge in Bayesian Multinomial Estimation

Estimating the parameters of sparse multinomial distributions is an important component of many statistical learning tasks. Recent approaches have used uncertainty over the vocabulary of symbols in a multinomial distribution as a means of accounting for sparsity. We present a Bayesian approach that allows weak prior knowledge, in the form of a small set of approximate candidate vocabularies, to be used to dramatically improve the resulting estimates. We demonstrate these improvements in applications to text compression and estimating distributions over words in newsgroup data.

[1]  P. Laplace A Philosophical Essay On Probabilities , 1902 .

[2]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[3]  Wilfred Perks,et al.  Some observations on inverse probability including a new indifference rule , 1947 .

[4]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[5]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[6]  Yoram Singer,et al.  Efficient Bayesian Parameter Estimation in Large Discrete Domains , 1998, NIPS.

[7]  Eric Sven Ristad,et al.  A natural law of succession , 1995, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[8]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .