Optimal Decisions from Probabilistic Models: The Intersection-over-Union Case

A probabilistic model allows us to reason about the world and make statistically optimal decisions using Bayesian decision theory. However, in practice the intractability of the decision problem forces us to adopt simplistic loss functions such as the 0/1 loss or Hamming loss and as result we make poor decisions through MAP estimates or through low-order marginal statistics. In this work we investigate optimal decision making for more realistic loss functions. Specifically we consider the popular intersection-over-union (IoU) score used in image segmentation benchmarks and show that it results in a hard combinatorial decision problem. To make this problem tractable we propose a statistical approximation to the objective function, as well as an approximate algorithm based on parametric linear programming. We apply the algorithm on three benchmark datasets and obtain improved intersection-over-union scores compared to maximum-posterior-marginal decisions. Our work points out the difficulties of using realistic loss functions with probabilistic computer vision models.

[1]  S. Zionts,et al.  Programming with linear fractional functionals , 1968 .

[2]  John A. Tomlin,et al.  On pricing and backward transformation in linear programming , 1974, Math. Program..

[3]  A. Dawid The Well-Calibrated Bayesian: Rejoinder , 1982 .

[4]  A. Dawid The Well-Calibrated Bayesian , 1982 .

[5]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Tomaso Poggio,et al.  Probabilistic Solution of Ill-Posed Problems in Computational Vision , 1987 .

[7]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[8]  Tomomi Matsui,et al.  Parametric simplex algorithms for solving a special class of nonconvex minimization problems , 1991, J. Glob. Optim..

[9]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[10]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  R. Heijmans When does the expectation of a ratio equal the ratio of expectations? , 1999 .

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Abraham P. Punnen,et al.  A survey of very large-scale neighborhood search techniques , 2002, Discret. Appl. Math..

[14]  Nikolaos V. Sahinidis,et al.  Global Optimization of 0-1 Hyperbolic Programs , 2002, J. Glob. Optim..

[15]  Siegfried Schaible,et al.  Fractional programming: The sum-of-ratios case , 2003, Optim. Methods Softw..

[16]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[17]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[18]  Gökhan BakIr,et al.  Generalization Bounds and Consistency for Structured Labeling , 2007 .

[19]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[20]  Vladimir Kolmogorov,et al.  Applications of parametric maxflow in computer vision , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[22]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  S. Rice A stochastic version of the Price equation reveals the interplay of deterministic and stochastic processes in evolution , 2008, BMC Evolutionary Biology.

[24]  A. Cambini,et al.  Generalized Convexity and Optimization: Theory and Applications , 2008 .

[25]  Lurdes Y. T. Inoue,et al.  Decision Theory: Principles and Approaches , 2009 .

[26]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Daphne Koller,et al.  Efficiently selecting regions for scene understanding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  D. Mumford,et al.  Pattern Theory: The Stochastic Analysis of Real-World Signals , 2010 .

[29]  Christopher G. Small,et al.  Expansions and Asymptotics for Statistics , 2010 .

[30]  Yang Wang,et al.  Optimizing Complex Loss Functions in Structured Prediction , 2010, ECCV.

[31]  Joachim Denzler,et al.  A Fast Approach for Pixelwise Labeling of Facade Images , 2010, 2010 20th International Conference on Pattern Recognition.

[32]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[33]  Sebastian Nowozin,et al.  Putting MAP Back on the Map , 2011, DAGM-Symposium.

[34]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[35]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[36]  Daniel Tarlow Big and Tall : Large Margin Learning with High Order Losses , 2011 .

[37]  Andrew Zisserman,et al.  Pylon Model for Semantic Segmentation , 2011, NIPS.

[38]  Sebastian Nowozin,et al.  Decision tree fields , 2011, 2011 International Conference on Computer Vision.

[39]  Joachim Denzler,et al.  Semantic Segmentation with Millions of Features: Integrating Multiple Cues in a Combined Random Forest Approach , 2012, ACCV.

[40]  Richard S. Zemel,et al.  Structured Output Learning with High Order Loss Functions , 2012, AISTATS.

[41]  Pushmeet Kohli,et al.  Learning Low-order Models for Enforcing High-order Statistics , 2012, AISTATS.

[42]  Ryan P. Adams,et al.  Revisiting uncertainty in graph cut solutions , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ulrich K. Müller RISK OF BAYESIAN INFERENCE IN MISSPECIFIED MODELS, AND THE SANDWICH COVARIANCE MATRIX , 2013 .

[45]  Sebastian Nowozin,et al.  A Comparative Study of Modern Inference Techniques for Discrete Energy Minimization Problems , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Justin Domke,et al.  Learning Graphical Model Parameters with Approximate Marginal Inference , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Vladlen Koltun,et al.  Parameter Learning and Convergent Inference for Dense Random Fields , 2013, ICML.

[48]  Vittorio Ferrari,et al.  Learning to approximate global shape priors for figure-ground segmentation , 2013, BMVC.

[49]  Stochastic Relaxation , 2014, Computer Vision, A Reference Guide.

[50]  S. Rice The expected value of the ratio of correlated random variables , 2015 .