The Bethe approximation of the pattern maximum likelihood distribution

Among all memoryless source distributions, the pattern maximum likelihood (PML) distribution is the distribution which maximizes the probability that a memoryless source produces a string with a given pattern. Equivalently, the PML distribution maximizes the permanent of a certain non-negative matrix. We reformulate this maximization problem as a double minimization problem of a suitable Gibbs free energy function. Because finding the minimum of this function appears intractable for practically relevant problem sizes, one must look for tractable approximations. One approach is to approximately find a minimum (or at least a local minimum) of the Gibbs free energy function by applying an alternating minimization algorithm where the steps are based on quantities that are obtained by Markov chain Monte Carlo sampling. One can show that this approach is equivalent to an algorithm that was proposed by Orlitsky et al. An alternative approach is to replace the Gibbs free energy function by a tractable approximation like the Bethe free energy function and to apply an alternating minimization algorithm to this function. As it turns out, empirically, this latter approach gives very good approximations to the PML distribution (or at least a locally optimal PML distribution), and, for the same level of accuracy, is two to three orders of magnitude faster than the former approach for practically relevant problem sizes. Moreover, the above free energy framework allows us to simplify some earlier proofs of properties of the PML distribution and to derive some new properties of the PML distribution, along with obtaining similar results for its Bethe approximation.

[1]  Sanjeev R. Kulkarni,et al.  A Better Good-Turing Estimator for Sequence Probabilities , 2007, 2007 IEEE International Symposium on Information Theory.

[2]  Alon Orlitsky,et al.  The maximum likelihood probability of skewed patterns , 2009, 2009 IEEE International Symposium on Information Theory.

[3]  Alon Orlitsky,et al.  On Modeling Profiles Instead of Values , 2004, UAI.

[4]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[5]  Michael Chertkov,et al.  Computing the Permanent with Belief Propagation , 2011, ArXiv.

[6]  Michael Chertkov,et al.  Belief propagation and loop calculus for the permanent of a non-negative matrix , 2009, ArXiv.

[7]  Michael Chertkov,et al.  Belief Propagation and Beyond for Particle Tracking , 2008, ArXiv.

[8]  P. O. Vontobel,et al.  The Bethe Permanent of a Nonnegative Matrix , 2011, IEEE Transactions on Information Theory.

[9]  Leonid Gurvits,et al.  Unleashing the power of Schrijver's permanental inequality with the help of the Bethe Approximation , 2011, Electron. Colloquium Comput. Complex..

[10]  A. Orlitsky,et al.  On estimating the probability multiset , 2011 .

[11]  Bert Huang,et al.  Approximating the Permanent with Belief Propagation , 2009, ArXiv.

[12]  Pascal O. Vontobel,et al.  The Bethe permanent of a non-negative matrix , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[13]  Gregory Valiant,et al.  Estimating the unseen: A sublinear-sample canonical estimator of distributions , 2010, Electron. Colloquium Comput. Complex..

[14]  Alon Orlitsky,et al.  Algorithms for modeling distributions over large alphabets , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[15]  Sanjeev R. Kulkarni,et al.  Probability Estimation in the Rare-Events Regime , 2011, IEEE Transactions on Information Theory.

[16]  M Chertkov,et al.  Inference in particle tracking experiments by passing messages between images , 2009, Proceedings of the National Academy of Sciences.

[17]  Gil I. Shamir Universal Lossless Compression With Unknown Alphabets - The Average Case , 2006, IEEE Trans. Inf. Theory.

[18]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[19]  Pascal O. Vontobel,et al.  Counting in Graph Covers: A Combinatorial Characterization of the Bethe Entropy Function , 2010, IEEE Transactions on Information Theory.

[20]  Alon Orlitsky,et al.  Recent results on pattern maximum likelihood , 2009, 2009 IEEE Information Theory Workshop on Networking and Information Theory.

[21]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[22]  Alon Orlitsky,et al.  Universal compression of memoryless sources over unknown alphabets , 2004, IEEE Transactions on Information Theory.