ℋC-search for structured prediction in computer vision

The mainstream approach to structured prediction problems in computer vision is to learn an energy function such that the solution minimizes that function. At prediction time, this approach must solve an often-challenging optimization problem. Search-based methods provide an alternative that has the potential to achieve higher performance. These methods learn to control a search procedure that constructs and evaluates candidate solutions. The recently-developed ℋC-Search method has been shown to achieve state-of-the-art results in natural language processing, but mixed success when applied to vision problems. This paper studies whether ℋC-Search can achieve similarly competitive performance on basic vision tasks such as object detection, scene labeling, and monocular depth estimation, where the leading paradigm is energy minimization. To this end, we introduce a search operator suited to the vision domain that improves a candidate solution by probabilistically sampling likely object configurations in the scene from the hierarchical Berkeley segmentation. We complement this search operator by applying the DAgger algorithm to robustly train the search heuristic so it learns from its previous mistakes. Our evaluation shows that these improvements reduce the branching factor and search depth, and thus give a significant performance boost. Our state-of-the-art results on scene labeling and depth estimation suggest that ℋC-Search provides a suitable tool for learning and inference in vision.

[1]  Honglak Lee,et al.  Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Alan Fern,et al.  Speedup Learning , 2010, Encyclopedia of Machine Learning.

[3]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[5]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[6]  Iasonas Kokkinos,et al.  Shape grammar parsing via Reinforcement Learning , 2011, CVPR 2011.

[7]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[8]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Andrew Zisserman,et al.  Pylon Model for Semantic Segmentation , 2011, NIPS.

[10]  Thomas G. Dietterich,et al.  Learning Greedy Policies for the Easy-First Framework , 2015, AAAI.

[11]  Jitendra Malik,et al.  Semantic segmentation using regions and parts , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Martial Hebert,et al.  Stacked Hierarchical Labeling , 2010, ECCV.

[13]  Ben Taskar,et al.  Sidestepping Intractable Inference with Structured Ensemble Cascades , 2010, NIPS.

[14]  Pradeep Ravikumar,et al.  Quadratic programming relaxations for metric labeling and Markov random field MAP estimation , 2006, ICML.

[15]  Pablo Andrés Arbeláez,et al.  Boundary Extraction in Natural Images Using Ultrametric Contour Maps , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[16]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[17]  Long Zhu,et al.  Recursive segmentation and recognition templates for image parsing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Daphne Koller,et al.  Efficiently selecting regions for scene understanding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[20]  David A. McAllester,et al.  The Generalized A* Architecture , 2007, J. Artif. Intell. Res..

[21]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[22]  Gregory Shakhnarovich,et al.  Discriminative Re-ranking of Diverse Segmentations , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Marc Toussaint,et al.  Message-Passing Algorithms for MAP Estimation Using DC Programming , 2012, AISTATS.

[24]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[25]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Ce Liu,et al.  Depth Extraction from Video Using Non-parametric Sampling , 2012, ECCV.

[27]  Alan Fern,et al.  HC-Search: A Learning Framework for Search-based Structured Prediction , 2014, J. Artif. Intell. Res..

[28]  Sinisa Todorovic,et al.  Scene Labeling Using Beam Search under Mutex Constraints , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Gregory Shakhnarovich,et al.  Diverse M-Best Solutions in Markov Random Fields , 2012, ECCV.

[31]  Puneet Gupta,et al.  Beam search for feature selection in automatic SVM defect classification , 2002, Object recognition supported by user interaction for service robots.

[32]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[33]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[34]  Iasonas Kokkinos,et al.  Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound , 2011, NIPS.

[35]  Mohamed R. Amer,et al.  Cost-Sensitive Top-Down/Bottom-Up Inference for Multiscale Activity Recognition , 2012, ECCV.

[36]  Ashutosh Saxena,et al.  Make3D: Learning 3D Scene Structure from a Single Still Image , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Ben Taskar,et al.  SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Thomas G. Dietterich,et al.  Learning to Detect Basal Tubules of Nematocysts in SEM Images , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[39]  Shlomo Zilberstein,et al.  Message-Passing Algorithms for Quadratic Programming Formulations of MAP Estimation , 2011, UAI.

[40]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Fei-Fei Li,et al.  Image Segmentation with Topic Random Field , 2010, ECCV.

[42]  Sinisa Todorovic,et al.  SLEDGE: Sequential Labeling of Image Edges for Boundary Detection , 2013, International Journal of Computer Vision.

[43]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Christoph Schnörr,et al.  MAP-Inference for Highly-Connected Graphs with DC-Programming , 2008, DAGM-Symposium.

[45]  Adrian Barbu,et al.  Generalizing Swendsen-Wang to sampling arbitrary posterior probabilities , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Alan Fern,et al.  Structured prediction via output space search , 2014, J. Mach. Learn. Res..

[47]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[48]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.