论文信息 - Dealing with small data and training blind spots in the Manhattan world

Dealing with small data and training blind spots in the Manhattan world

Leveraging Manhattan assumption we generate metrically rectified novel views from a single image, even for non-box scenarios. Our novel views enable the already trained classifiers to handle training data missing views (blind spots) without additional training. We demonstrate this on end-to-end scene text spotting under perspective. Additionally, utilizing our fronto-parallel views, we discover unsuspended invariant mid-level patches given a few widely separated training examples (small data domain). These invariant patches outperform various baselines on small data image retrieval challenge.

Martial Hebert | Javier Civera | Luis Montano | Wajahat Hussain

[1] Kai Wang,et al. End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[2] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3] Svetlana Lazebnik,et al. Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[4] Palaiahnakote Shivakumara,et al. Recognizing Text with Perspective Distortion in Natural Scenes , 2013, 2013 IEEE International Conference on Computer Vision.

[5] Antonio Criminisi,et al. Creating Architectural Models from Images , 1999, Comput. Graph. Forum.

[6] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[7] Sanja Fidler,et al. Box in the Box: Joint 3D Layout and Object Reasoning from Single Images , 2013, 2013 IEEE International Conference on Computer Vision.

[8] Martial Hebert,et al. Patch to the Future: Unsupervised Visual Prediction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Antonio Torralba,et al. Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10] Tatiana Novikova,et al. Large-Lexicon Attribute-Consistent Text Recognition in Natural Images , 2012, ECCV.

[11] Silvio Savarese,et al. View Synthesis for Recognizing Unseen Poses of Object Classes , 2008, ECCV.

[12] Silvio Savarese,et al. Object Detection with Geometrical Context Feedback Loop , 2010, BMVC.

[13] Marc Pollefeys,et al. Match Box: Indoor Image Matching via Box-Like Scene Estimation , 2014, 2014 2nd International Conference on 3D Vision.

[14] Jan-Michael Frahm,et al. 3D model matching with Viewpoint-Invariant Patches (VIP) , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Martial Hebert,et al. 3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[16] Bernhard P. Wrobel,et al. Multiple View Geometry in Computer Vision , 2001 .

[17] David A. Forsyth,et al. Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[18] B. Caprile,et al. Using vanishing points for camera calibration , 1990, International Journal of Computer Vision.

[19] Silvio Savarese,et al. Understanding Indoor Scenes Using 3D Geometric Phrases , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Carsten Rother,et al. A New Approach for Vanishing Point Detection in Architectural Environments , 2000, BMVC.

[21] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[22] Dragomir Anguelov,et al. Capturing Long-Tail Distributions of Object Subcategories , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Hartmut Neven,et al. PhotoOCR: Reading Text in Uncontrolled Conditions , 2013, 2013 IEEE International Conference on Computer Vision.

[24] Antonio Torralba,et al. Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[25] Martial Hebert,et al. Data-Driven Scene Understanding from 3D Models , 2012, BMVC.

[26] Kristen Grauman,et al. Inferring Unseen Views of People , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Alexei A. Efros,et al. Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29] Svetlana Lazebnik,et al. Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[30] Derek Hoiem,et al. Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31] Alexei A. Efros,et al. Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[32] Yong Jae Lee,et al. Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time , 2013, 2013 IEEE International Conference on Computer Vision.

[33] Sanja Fidler,et al. Lost Shopping! Monocular Localization in Large Indoor Spaces , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34] Alexei A. Efros,et al. Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[35] Sohaib Khan,et al. Shape from Angle Regularity , 2012, ECCV.

[36] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[37] Andrew Zisserman,et al. Deep Features for Text Spotting , 2014, ECCV.

[38] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[39] Martial Hebert,et al. Data-Driven 3D Primitives for Single Image Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[40] Tao Wang,et al. End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[41] T. Kanade,et al. Geometric reasoning for single image structure recovery , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42] Martial Hebert,et al. Single Image 3D without a Single 3D Image , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43] Arnold W. M. Smeulders,et al. Stages as Models of Scene Geometry , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.