On Sparse, Spectral and Other Parameterizations of Binary Probabilistic Models

This paper studies issues relating to the parameterization of probability distributions over binary data sets. Several such parameterizations of models for binary data are known, including the Ising, generalized Ising, canonical and full parameterizations. We also discuss a parameterization that we call the “spectral parameterization”, which has received significantly less coverage in existing literature. We provide this parameterization with a spectral interpretation by casting loglinear models in terms of orthogonal WalshHadamard harmonic expansions. Using various standard and group sparse regularizers for structural learning, we provide a comprehensive theoretical and empirical comparison of these parameterizations. We show that the spectral parameterization, along with the canonical, has the best performance and sparsity levels, while the spectral does not depend on any particular reference state. The spectral interpretation also provides a new starting point for analyzing the statistics of binary data sets; we measure the magnitude of higher order interactions in the underlying distributions for several data sets.

[1]  Mohammad Maqusi,et al.  Walsh Series Expansions of Probability Distributions , 1981, IEEE Transactions on Electromagnetic Compatibility.

[2]  Jason Weston,et al.  Kernel methods for Multi-labelled classification and Categ orical regression problems , 2001, NIPS 2001.

[3]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[4]  Xiaojin Zhu,et al.  Learning Higher-Order Graph Structure with Features by Structure Penalty , 2011, NIPS.

[5]  Larry Wasserman,et al.  All of Statistics , 2004 .

[6]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[7]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[8]  S. Fienberg,et al.  DESCRIBING DISABILITY THROUGH INDIVIDUAL-LEVEL MIXTURE MODELS FOR MULTIVARIATE BINARY DATA. , 2007, The annals of applied statistics.

[9]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[10]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[11]  Michael I. Jordan Graphical Models , 1998 .

[12]  D. Edwards,et al.  A fast procedure for model search in multidimensional contingency tables , 1985 .

[13]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[14]  Masashi Kato,et al.  A haplotype inference method based on sparsely connected multi-body ising model , 2010 .

[15]  H. Andrews,et al.  Hadamard transform image coding , 1969 .

[16]  Mark W. Schmidt,et al.  Convex Structure Learning in Log-Linear Models: Beyond Pairwise Potentials , 2010, AISTATS.

[17]  Pieter Abbeel,et al.  Learning Factor Graphs in Polynomial Time and Sample Complexity , 2006, J. Mach. Learn. Res..

[18]  Ali Jalali,et al.  On Learning Discrete Graphical Models using Group-Sparse Regularization , 2011, AISTATS.

[19]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[20]  Daniel N. Rockmore,et al.  Some applications of generalized FFT's , 1997, Groups and Computation.

[21]  Peter Bühlmann,et al.  Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries , 2007, BMC Bioinformatics.