Discriminatively Trained Dense Surface Normal Estimation

In this work we propose the method for a rather unexplored problem of computer vision - discriminatively trained dense surface normal estimation from a single image. Our method combines contextual and segment-based cues and builds a regressor in a boosting framework by transforming the problem into the regression of coefficients of a local coding. We apply our method to two challenging data sets containing images of man-made environments, the indoor NYU2 data set and the outdoor KITTI data set. Our surface normal predictor achieves results better than initially expected, significantly outperforming state-of-the-art.

[1]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[2]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[7]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[9]  KeeChang Lee,et al.  Fast Automatic Single-View 3-d Reconstruction of Urban Scenes , 2008, ECCV.

[10]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Stewart Burn,et al.  Superpixels via pseudo-Boolean optimization , 2011, 2011 International Conference on Computer Vision.

[12]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13]  Olivier D. Faugeras,et al.  Shape From Shading , 2006, Handbook of Mathematical Models in Computer Vision.

[14]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[15]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[16]  Honglak Lee,et al.  A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Ian D. Reid,et al.  A Dynamic Programming Approach to Reconstructing Building Interiors , 2010, ECCV.

[18]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[19]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[20]  Martial Hebert,et al.  Data-Driven 3D Primitives for Single Image Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[23]  Tsuhan Chen,et al.  Learning class-specific affinities for image labelling , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[25]  T. Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[27]  Karl Kunisch,et al.  Total Generalized Variation , 2010, SIAM J. Imaging Sci..

[28]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[29]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[30]  Ian D. Reid,et al.  Growing semantically meaningful models for visual SLAM , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Bill Triggs,et al.  Visual Recognition Using Local Quantized Patterns , 2012, ECCV.

[32]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[34]  Kiyoharu Aizawa,et al.  Photometric Stereo Using Constrained Bivariate Regression for General Isotropic Surfaces , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[37]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[39]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Lin Yang,et al.  Multiple Class Segmentation Using A Unified Framework over Mean-Shift Patches , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[43]  Joost van de Weijer,et al.  Fusing Global and Local Scale for Semantic Image Segmentation , 2011 .

[44]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[45]  Ashutosh Saxena,et al.  Learning 3-D Scene Structure from a Single Still Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[46]  Isabelle Guyon,et al.  Automatic Capacity Tuning of Very Large VC-Dimension Classifiers , 1992, NIPS.

[47]  Cordelia Schmid,et al.  Object Recognition by Integrating Multiple Image Segmentations , 2008, ECCV.

[48]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  David J. Kriegman,et al.  Beyond Lambert: reconstructing specular surfaces using color , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[50]  Joost van de Weijer,et al.  Harmony Potentials , 2011, International Journal of Computer Vision.

[51]  Alexei A. Efros,et al.  Closing the loop in scene interpretation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.