Entropy-Based Latent Structured Output Prediction

Recently several generalizations of the popular latent structural SVM framework have been proposed in the literature. Broadly speaking, the generalizations can be divided into two categories: (i) those that predict the output variables while either marginalizing the latent variables or estimating their most likely values, and (ii) those that predict the output variables by minimizing an entropy-based uncertainty measure over the latent space. In order to aid their application in computer vision, we study these generalizations with the aim of identifying their strengths and weaknesses. To this end, we propose a novel prediction criterion that includes as special cases all previous prediction criteria that have been used in the literature. Specifically, our framework's prediction criterion minimizes the Aczél and Daróczy entropy of the output. This in turn allows us to design a learning objective that provides a unified framework (UF) for latent structured prediction. We develop a single optimization algorithm and empirically show that it is as effective as the more complex approaches that have been previously employed for latent structured prediction. Using this algorithm, we provide empirical evidence that lends support to prediction via the minimization of the latent space uncertainty.

[1]  Kevin Miller,et al.  Max-Margin Min-Entropy Models , 2012, AISTATS.

[2]  William T. Freeman,et al.  Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Sebastian Nowozin,et al.  Efficient Nonlinear Markov Models for Human Motion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[5]  Sebastian Nowozin,et al.  Action Points: A Representation for Low-latency Online Human Action Recognition , 2012 .

[6]  Stephen Gould,et al.  Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding , 2010, ECCV.

[7]  Marc Pollefeys,et al.  Efficient Structured Prediction with Latent Variables for General Graphical Models , 2012, ICML.

[8]  C. V. Jawahar,et al.  Optimizing Average Precision Using Weakly Supervised Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Arak M. Mathai,et al.  Basic Concepts in Information Theory and Statistics: Axiomatic Foundations and Applications , 1975 .

[10]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[11]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[12]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[13]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[14]  Jun Zhu,et al.  Maximum Entropy Discrimination Markov Networks , 2009, J. Mach. Learn. Res..

[15]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[16]  Domingo Morales,et al.  A summary on entropy statistics , 1995, Kybernetika.

[17]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Subhransu Maji,et al.  Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.

[19]  Z. Daróczy,et al.  Charakterisierung der Entropien positiver Ordnung und der shannonschen Entropie , 1963 .

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[22]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[23]  Samuel Kaski,et al.  Expectation maximization algorithms for conditional likelihoods , 2005, ICML '05.

[24]  Wei Ping,et al.  Marginal Structured SVM with Hidden Variables , 2014, ICML.

[25]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[26]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[28]  R. Sundberg Maximum Likelihood Theory for Incomplete Data from an Exponential Family , 2016 .

[29]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[30]  Daphne Koller,et al.  Learning specific-class segmentation from diverse data , 2011, 2011 International Conference on Computer Vision.

[31]  Bo Zhang,et al.  Partially Observed Maximum Entropy Discrimination Markov Networks , 2008, NIPS.