Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions

In this work we develop efficient methods for learning random MAP predictors for structured label problems. In particular, we construct posterior distributions over perturbations that can be adjusted via stochastic gradient methods. We show that any smooth posterior distribution would suffice to define a smooth PAC-Bayesian risk bound suitable for gradient methods. In addition, we relate the posterior distributions to computational properties of the MAP predictors. We suggest multiplicative posteriors to learn super-modular potential functions that accompany specialized MAP predictors such as graph-cuts. We also describe label-augmented posterior models that can use efficient MAP approximations, such as those arising from linear program relaxations.

[1]  Gerald B. Folland,et al.  Real Analysis: Modern Techniques and Their Applications , 1984 .

[2]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[3]  Leslie Ann Goldberg,et al.  The Complexity of Ferromagnetic Ising with Local Fields , 2006, Combinatorics, Probability and Computing.

[4]  Alexander M. Rush,et al.  A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing , 2012, J. Artif. Intell. Res..

[5]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[6]  Ryan P. Adams,et al.  Randomized Optimum Models for Structured Prediction , 2012, AISTATS.

[7]  Tommi S. Jaakkola,et al.  On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.

[8]  Patrick Pérez,et al.  Interactive Image Segmentation Using an Adaptive GMMRF Model , 2004, ECCV.

[9]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[10]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[11]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[12]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[13]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[14]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[15]  Tommi S. Jaakkola,et al.  Tightening LP Relaxations for MAP using Message Passing , 2008, UAI.

[16]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Max Welling,et al.  Herding dynamical weights to learn , 2009, ICML '09.

[18]  Anand D. Sarwate,et al.  On Measure Concentration of Random Maximum A-Posteriori Perturbations , 2013, ICML.

[19]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.

[20]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[21]  Andreas Maurer,et al.  A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[22]  Yevgeny Seldin A PAC-Bayesian Approach to Structure Learning , 2009 .

[23]  Tamir Hazan,et al.  PAC-Bayesian approach for minimization of phoneme error rate , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[25]  Richard S. Zemel,et al.  Structured Output Learning with High Order Loss Functions , 2012, AISTATS.

[26]  Subhransu Maji,et al.  On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations , 2013, NIPS.