Exact inference and learning for cumulative distribution functions on loopy graphs

Many problem domains including climatology and epidemiology require models that can capture both heavy-tailed statistics and local dependencies. Specifying such distributions using graphical models for probability density functions (PDFs) generally lead to intractable inference and learning. Cumulative distribution networks (CDNs) provide a means to tractably specify multivariate heavy-tailed models as a product of cumulative distribution functions (CDFs). Existing algorithms for inference and learning in CDNs are limited to those with tree-structured (non-loopy) graphs. In this paper, we develop inference and learning algorithms for CDNs with arbitrary topology. Our approach to inference and learning relies on recursively decomposing the computation of mixed derivatives based on a junction trees over the cumulative distribution functions. We demonstrate that our systematic approach to utilizing the sparsity represented by the junction tree yields significant performance improvements over the general symbolic differentiation programs Mathematica and D*. Using two real-world datasets, we demonstrate that non-tree structured (loopy) CDNs are able to provide significantly better fits to the data as compared to tree-structured and unstructured CDNs and other heavy-tailed multivariate distributions such as the multivariate copula and logistic models.

[1]  Brendan J. Frey,et al.  Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs , 2008, J. Mach. Learn. Res..

[2]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[3]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[4]  Nebojsa Jojic,et al.  Maximum-likelihood learning of cumulative distribution functions on graphs , 2010, AISTATS.

[5]  T. Speed,et al.  Gaussian Markov Distributions over Finite Graphs , 1986 .

[6]  Thomas S. Richardson,et al.  Iterative Conditional Fitting for Gaussian Ancestral Graph Models , 2004, UAI.

[7]  Alessandro Vespignani,et al.  Prediction and predictability of global epidemics: the role of the airline transportation network , 2005, q-bio/0507029.

[8]  Sanford Weisberg,et al.  Computing science and statistics : proceedings of the 30th Symposium on the Interface, Minneapolis, Minnesota, May 13-16, 1998 : dimension reduction, computational complexity and information , 1998 .

[9]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[10]  N. L. Johnson,et al.  Multivariate Logistic Distributions , 2005 .

[11]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[12]  Michael Hardy Combinatorics of Partial Derivatives , 2006, Electron. J. Comb..

[13]  Brian Guenter,et al.  Efficient symbolic differentiation for graphics applications , 2007, SIGGRAPH 2007.

[14]  L. Haan,et al.  Extreme value theory , 2006 .

[15]  Jim C. Huang Cumulative distribution networks: inference, estimation and applications of graphical models for cumulative distribution functions , 2009 .

[16]  Bovas Abraham,et al.  Multivariate Logistic Distributions , 1973 .

[17]  Sergey Kirshner,et al.  Learning with Tree-Averaged Densities and Distributions , 2007, NIPS.