Hyper-parameter Learning for Sparse Structured Probabilistic Models

In this paper, we consider the estimation of hyperparameters for regularization terms commonly used for obtaining structured sparse parameters in signal estimation problems, such as signal denoising. By considering the convex regularization terms as negative log-densities, we propose approximate maximum likelihood estimation for estimating parameters for continuous log-supermodular distributions, which is a key property that many sparse priors have. We then show how "perturb-and-MAP" ideas based on the Gumbel distribution and efficient discretization can be used to approximate the log-partition function for these models, which is a crucial step for approximate maximum likelihood estimation. We illustrate our estimation procedure on a set of experiments with flow-based priors and signal denoising.

[1]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[2]  Andreas Krause,et al.  From MAP to Marginals: Variational Inference in Bayesian Submodular Models , 2014, NIPS.

[3]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[4]  Volkan Cevher,et al.  Combinatorial Penalties: Which structures are preserved by convex relaxations? , 2017, AISTATS.

[5]  Hiroshi Ishikawa,et al.  Exact Optimization for Markov Random Fields with Convex Priors , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Lawrence Carin,et al.  Exploiting Structure in Wavelet-Based Bayesian Compressive Sensing , 2009, IEEE Transactions on Signal Processing.

[7]  H. Groenevelt Two algorithms for maximizing a separable concave function over a polymatroid feasible region , 1991 .

[8]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[9]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[11]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[12]  Volkan Cevher,et al.  Model-Based Compressive Sensing , 2008, IEEE Transactions on Information Theory.

[13]  Julien Mairal,et al.  Structured sparsity through convex optimization , 2011, ArXiv.

[14]  Francis Bach,et al.  Submodular functions: from discrete to continuous domains , 2015, Mathematical Programming.

[15]  Francis R. Bach,et al.  Learning the Structure for Structured Sparsity , 2014, IEEE Transactions on Signal Processing.

[16]  Rémi Gribonval,et al.  Should Penalized Least Squares Regression be Interpreted as Maximum A Posteriori Estimation? , 2011, IEEE Transactions on Signal Processing.

[17]  Francis R. Bach,et al.  Convex Relaxation for Combinatorial Penalties , 2012, ArXiv.

[18]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[19]  Anton Osokin,et al.  Marginal Weighted Maximum Log-likelihood for Efficient Learning of Perturb-and-Map models , 2018, UAI.

[20]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[21]  Tommi S. Jaakkola,et al.  On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.

[22]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[23]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[24]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[25]  S. Karlin,et al.  Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions , 1980 .