Total positivity in exponential families with application to binary variables

We study exponential families of distributions that are multivariate totally positive of order 2 (MTP2), show that these are convex exponential families, and derive conditions for existence of the MLE. Quadratic exponential familes of MTP2 distributions contain attractive Gaussian graphical models and ferromagnetic Ising models as special examples. We show that these are defined by intersecting the space of canonical parameters with a polyhedral cone whose faces correspond to conditional independence relations. Hence MTP2 serves as an implicit regularizer for quadratic exponential families and leads to sparsity in the estimated graphical model. We prove that the maximum likelihood estimator (MLE) in an MTP2 binary exponential family exists if and only if both of the sign patterns $(1,-1)$ and $(-1,1)$ are represented in the sample for every pair of variables; in particular, this implies that the MLE may exist with $n=d$ observations, in stark contrast to unrestricted binary exponential families where $2^d$ observations are required. Finally, we provide a novel and globally convergent algorithm for computing the MLE for MTP2 Ising models similar to iterative proportional scaling and apply it to the analysis of data from two psychological disorders.

[1]  N. Wermuth,et al.  Palindromic Bernoulli distributions , 2015, 1510.09072.

[2]  R. Jirousek,et al.  On the effective implementation of the iterative proportional fitting procedure , 1995 .

[3]  Bernd Sturmfels,et al.  Geometry of Log-Concave Density Estimation , 2017, Discrete & Computational Geometry.

[4]  Bernd Sturmfels,et al.  Tensors of Nonnegative Rank Two , 2013 .

[5]  D. Borsboom,et al.  Network analysis: an integrative approach to the structure of psychopathology. , 2013, Annual review of clinical psychology.

[6]  Moshe Shaked,et al.  Some notions of multivariate positive dependence , 2005 .

[7]  L. Wasserman,et al.  Universal inference , 2019, Proceedings of the National Academy of Sciences.

[8]  Andreas Krause,et al.  Scalable Variational Inference in Log-supermodular Models , 2015, ICML.

[9]  David Bruce Wilson,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996, Random Struct. Algorithms.

[10]  Dmitry M. Malioutov,et al.  Walk-Sums and Belief Propagation in Gaussian Graphical Models , 2006, J. Mach. Learn. Res..

[11]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[12]  Matthias Hein,et al.  Estimation of positive definite M-matrices and structure learning for attractive Gaussian Markov Random fields , 2014, 1404.6640.

[13]  S. Lauritzen,et al.  Maximum likelihood estimation in Gaussian models under total positivity , 2017, The Annals of Statistics.

[14]  C. Fortuin,et al.  Correlation inequalities on some partially ordered sets , 1971 .

[15]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[16]  Michael I. Jordan Graphical Models , 1998 .

[17]  Bhaskar Bhattacharya,et al.  Covariance selection and multivariate dependence , 2012, J. Multivar. Anal..

[18]  Steffen L. Lauritzen,et al.  Lectures on Contingency Tables , 2002 .

[19]  Shaun M. Fallat,et al.  Total positivity in Markov structures , 2015, 1510.01290.

[20]  Olga V. Demler,et al.  The US National Comorbidity Survey Replication (NCS‐R): design and field procedures , 2004, International journal of methods in psychiatric research.

[21]  Daniele Agostini,et al.  Discrete Gaussian Distributions via Theta Functions , 2018, SIAM J. Appl. Algebra Geom..

[22]  Piotr Zwiernik,et al.  Maximum likelihood estimation of the Latent Class Model through model boundary decomposition , 2017, Journal of Algebraic Statistics.

[23]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[24]  B. Sturmfels,et al.  Maximum likelihood estimation for totally positive log‐concave densities , 2018, Scandinavian Journal of Statistics.

[25]  Eduardo Pavez,et al.  Graph Learning From Filtered Signals: Graph System and Diffusion Kernel Identification , 2018, IEEE Transactions on Signal and Information Processing over Networks.

[26]  Ole E. Barndorff-Nielsen,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[27]  C.J.H. Mann,et al.  Probabilistic Conditional Independence Structures , 2005 .

[28]  Antonio Ortega,et al.  Graph Learning From Data Under Laplacian and Structural Constraints , 2016, IEEE Journal of Selected Topics in Signal Processing.

[29]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[30]  Francesco Bartolucci,et al.  A recursive algorithm for Markov random fields , 2002 .

[31]  O. Barndorff-Nielsen Information and Exponential Families in Statistical Theory , 1980 .

[32]  S. Karlin,et al.  Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions , 1980 .

[33]  C. Newman A general central limit theorem for FKG systems , 1983 .

[34]  F. Bartolucci,et al.  A likelihood ratio test for $MTP_2$ within binary variables , 2000 .

[35]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[36]  John P. Moussouris Gibbs and Markov random systems with constraints , 1974 .

[37]  L. Lauritzen,et al.  Lectures on Contingency Tables Electronic edition , 2002 .

[38]  Vincent Y. F. Tan,et al.  High-dimensional Gaussian graphical model selection: walk summability and local separation criterion , 2011, J. Mach. Learn. Res..

[39]  D. Geiger,et al.  On the toric algebra of graphical models , 2006, math/0608054.