A 2d + 3d rich data approach to scene understanding

On your one-minute walk from the coffee machine to your desk each morning, you pass by dozens of scenes - a kitchen, an elevator, your office - and you effortlessly recognize them and perceive their 3D structure. But this one-minute scene-understanding problem has been an open challenge in computer vision since the field was first established 50 years ago. In this dissertation, we aim to rethink the path researchers took over these years, challenge the standard practices and implicit assumptions in the current research, and redefine several basic principles in computational scene understanding. The key idea of this dissertation is that learning from rich data under natural setting is crucial for finding the right representation for scene understanding. First of all, to overcome the limitations of object-centric datasets, we built the Scene Understanding (SUN) Database, a large collection of real-world images that exhaustively spans all scene categories. This scene-centric dataset provides a more natural sample of human visual world, and establishes a realistic benchmark for standard 2D recognition tasks. However, while an image is a 2D array, the world is 3D and our eyes see it from a viewpoint, but this is not traditionally modeled. To obtain a 3D understanding at high-level, we reintroduce geometric figures using modern machinery. To model scene viewpoint, we propose a panoramic place representation to go beyond aperture computer vision and use data that is close to natural input for human visual system. This paradigm shift toward rich representation also opens up new challenges that require a new kind of big data - data with extra descriptions, namely rich data. Specifically, we focus on a highly valuable kind of rich data - multiple viewpoints in 3D - and we build the SUN3D data base to obtain an integrated place-centric representation of scenes. We argue for the great importance of modeling the computer's role as an agent in a 3D scene, and demonstrate the power of place-centric scene representation. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - [email protected])

[1]  Wolfram Burgard,et al.  An evaluation of the RGB-D SLAM system , 2012, 2012 IEEE International Conference on Robotics and Automation.

[2]  S. Ullman The interpretation of structure from motion , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[3]  Antonio Torralba,et al.  Building a database of 3D scenes from user annotations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[5]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[6]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Arnold W. M. Smeulders,et al.  Depth Information by Stage Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[9]  Jianxiong Xiao,et al.  Joint Affinity Propagation for Multiple View Segmentation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  陈宝权 GlobFit: Consistently Fitting Primitives by Discovering Global Relations , 2011 .

[12]  Jitendra Malik,et al.  When is scene recognition just texture recognition , 2010 .

[13]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[14]  Krista A. Ehinger,et al.  SUN Database: Exploring a Large Collection of Scene Categories , 2014, International Journal of Computer Vision.

[15]  Hans P. Moravec Towards Automatic Visual Obstacle Avoidance , 1977, IJCAI.

[16]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[17]  Jitendra Malik,et al.  Inferring spatial layout from a single image via depth-ordered grouping , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[19]  Matti Pietikäinen,et al.  Rotation Invariant Image Description with Local Binary Pattern Histogram Fourier Features , 2009, SCIA.

[20]  Stephen Gould,et al.  Discriminative learning with latent variables for cluttered indoor scene understanding , 2010, CACM.

[21]  Marcia K. Johnson,et al.  Importing perceived features into false memories , 2006, Memory.

[22]  Avideh Zakhor,et al.  Planar 3D modeling of building interiors from point cloud data , 2012, 2012 19th IEEE International Conference on Image Processing.

[23]  E. Reed The Ecological Approach to Visual Perception , 1989 .

[24]  Jianxiong Xiao,et al.  A Linear Approach to Matching Cuboids in RGBD Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  M. Goodale,et al.  Separate visual pathways for perception and action , 1992, Trends in Neurosciences.

[26]  Jianxiong Xiao,et al.  Memorability of Image Regions , 2012, NIPS.

[27]  Ali Farhadi,et al.  Recognition using visual phrases , 2011, CVPR 2011.

[28]  Daniel Cohen-Or,et al.  Fragment-based image completion , 2003, ACM Trans. Graph..

[29]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[30]  Alexei A. Efros,et al.  Photo clip art , 2007, ACM Trans. Graph..

[31]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Jianxiong Xiao,et al.  What makes an image memorable , 2011 .

[33]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[34]  Yinda Zhang,et al.  FrameBreak: Dramatic Image Extrapolation by Guided Shift-Maps , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Yuri Boykov,et al.  A Scalable graph-cut algorithm for N-D grids , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Jian Sun,et al.  Statistics of Patch Offsets for Image Completion , 2012, ECCV.

[37]  Antonio Torralba,et al.  Infinite Images: Creating and Exploring a Large Photorealistic Virtual Space , 2008, Proceedings of the IEEE.

[38]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[39]  Daniel Fried,et al.  Bayesian geometric modeling of indoor scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  A. Torralba,et al.  Matching and Predicting Street Level Images , 2010 .

[41]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[42]  Silvio Savarese,et al.  Semantic structure from motion , 2011, CVPR 2011.

[43]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[44]  Alexei A. Efros,et al.  Webcam clip art: appearance and illuminant transfer from time-lapse sequences , 2009, ACM Trans. Graph..

[45]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[46]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[47]  Antonio Criminisi,et al.  Epitomic location recognition , 2008, CVPR.

[48]  Aude Oliva,et al.  Vision in 3D Environments: Representing, perceiving, and remembering the shape of visual space , 2011 .

[49]  David A. Forsyth,et al.  Recovering free space of indoor scenes from a single image , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[51]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[52]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[53]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[54]  John J. Leonard,et al.  Using prioritized relaxations to locate objects in points clouds for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[55]  Krista A. Ehinger,et al.  Canonical views of scenes depend on the shape of the space , 2011, CogSci.

[56]  Krista A. Ehinger,et al.  Basic level scene understanding: from labels to structure and beyond , 2012, SIGGRAPH Asia Technical Briefs.

[57]  Tomaso Poggio,et al.  Cooperative computation of stereo disparity , 1988 .

[58]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Enrique Valero,et al.  Detection, Modeling, and Classification of Moldings for Automated Reverse Engineering of Buildings from 3D Data , 2011 .

[60]  George Vosselman,et al.  Reconstruction of 3D building models from aerial images and maps , 2003 .

[61]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[62]  Roberto Cipolla,et al.  Probabilistic visibility for multi-view stereo , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Alexei A. Efros,et al.  Image quilting for texture synthesis and transfer , 2001, SIGGRAPH.

[64]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[65]  Richard Szeliski,et al.  Reconstructing building interiors from images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[66]  Nancy Kanwisher,et al.  A cortical representation of the local visual environment , 1998, Nature.

[67]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[68]  Monika Sester,et al.  3D building roof reconstruction from point clouds via generative models , 2011, GIS.

[69]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[70]  Pietro Perona,et al.  Some Objects Are More Equal Than Others: Measuring and Predicting Importance , 2008, ECCV.

[71]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[72]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[73]  Bernt Schiele,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[74]  David Salesin,et al.  Interactive digital photomontage , 2004, SIGGRAPH 2004.

[75]  Assaf Zomet,et al.  Learning how to inpaint from global image statistics , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[76]  Tomás Pajdla,et al.  3D with Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[77]  Cordelia Schmid,et al.  Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.

[78]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[79]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[80]  Takeo Kanade,et al.  Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces , 2010, NIPS.

[81]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[82]  Thomas O. Binford,et al.  Computer Description of Curved Objects , 1973, IEEE Transactions on Computers.

[83]  Krista A. Ehinger,et al.  Recognizing scene viewpoint using panoramic place representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[85]  Guillermo Sapiro,et al.  Simultaneous structure and texture image inpainting , 2003, IEEE Trans. Image Process..

[86]  H. Intraub Rethinking Scene Perception , 2010 .

[87]  David A. Forsyth,et al.  Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[88]  Irfan Essa,et al.  Texture optimization for example-based synthesis , 2005, SIGGRAPH 2005.

[89]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[90]  Song-Chun Zhu,et al.  Image Parsing with Stochastic Scene Grammar , 2011, NIPS.

[91]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[92]  Carsten Rother,et al.  Extracting 3D Scene-Consistent Object Proposals and Depth from Stereo Images , 2012, ECCV.

[93]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[94]  Jianxiong Xiao,et al.  Image-based façade modeling , 2008, ACM Trans. Graph..

[95]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[96]  Richard Szeliski,et al.  Image-based interactive exploration of real-world environments , 2004, IEEE Computer Graphics and Applications.

[97]  Daniel P. Huttenlocher,et al.  Distance Transforms of Sampled Functions , 2012, Theory Comput..

[98]  Sung Yong Shin,et al.  On pixel-based texture synthesis by non-parametric sampling , 2006, Comput. Graph..

[99]  Richard Szeliski,et al.  Manhattan-world stereo , 2009, CVPR.

[100]  P.V.C. Hough,et al.  Machine Analysis of Bubble Chamber Pictures , 1959 .

[101]  David Salesin,et al.  Image Analogies , 2001, SIGGRAPH.

[102]  Franc Solina,et al.  Superquadrics for Segmenting and Modeling Range Data , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[103]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[104]  Krista A. Ehinger,et al.  Estimating scene typicality from human ratings and image features , 2011, CogSci.

[105]  Mark T. Keane,et al.  Cognitive Psychology: A Student's Handbook , 1990 .

[106]  Derek Hoiem,et al.  Seeing the world behind the image: Spatial layout for 3D scene understanding , 2007 .

[107]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[108]  Shmuel Peleg,et al.  Alignment and mosaicing of non-overlapping images , 2012, 2012 IEEE International Conference on Computational Photography (ICCP).

[109]  D Marr,et al.  A computational theory of human stereo vision. , 1979, Proceedings of the Royal Society of London. Series B, Biological sciences.

[110]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[111]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[112]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[113]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[114]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[115]  Jianxiong Xiao,et al.  Single image tree modeling , 2008, SIGGRAPH 2008.

[116]  Katsushi Ikeuchi,et al.  Toward an assembly plan from observation. I. Task recognition with polyhedral objects , 1994, IEEE Trans. Robotics Autom..

[117]  B. Tversky,et al.  Categories of environmental scenes , 1983, Cognitive Psychology.

[118]  Patrick Pérez,et al.  Object removal by exemplar-based inpainting , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[119]  Ken-ichi Anjyo,et al.  Tour into the picture: using a spidery mesh interface to make animation from a single image , 1997, SIGGRAPH.

[120]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[121]  Joseph Schlecht,et al.  Sampling bedrooms , 2011, CVPR 2011.

[122]  Eli Shechtman,et al.  Space-Time Completion of Video , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[123]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[124]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[125]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[126]  Jianxiong Xiao,et al.  Image-based street-side city modeling , 2009, ACM Trans. Graph..

[127]  Avideh Zakhor,et al.  Indoor localization and visualization using a human-operated backpack system , 2010, 2010 International Conference on Indoor Positioning and Indoor Navigation.

[128]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[129]  Stephen M. Kosslyn,et al.  Pictures and names: Making the connection , 1984, Cognitive Psychology.

[130]  Jianxiong Xiao,et al.  Multiple view semantic segmentation for street view images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[131]  Leonidas J. Guibas,et al.  Discovering structural regularity in 3D geometry , 2008, SIGGRAPH 2008.

[132]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[133]  Zehdreh Allen-Lafayette,et al.  Flattening the Earth, Two Thousand Years of Map Projections , 1998 .

[134]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[135]  Antonio Torralba,et al.  LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[136]  Andrew Zisserman,et al.  Video data mining using configurations of viewpoint invariant regions , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[137]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[138]  Joseph L. Mundy,et al.  Object Recognition in the Geometric Era: A Retrospective , 2006, Toward Category-Level Object Recognition.

[139]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[140]  Eli Shechtman,et al.  Image melding , 2012, ACM Trans. Graph..

[141]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[142]  Silvio Savarese,et al.  Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery , 2010, ECCV.

[143]  Guillermo Sapiro,et al.  Image inpainting , 2000, SIGGRAPH.

[144]  Marc Pollefeys,et al.  Interactive 3D architectural modeling from unordered photo collections , 2008, SIGGRAPH 2008.

[145]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[146]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[147]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[148]  Sven J. Dickinson,et al.  The Role of Model-Based Segmentation in the Recovery of Volumetric Parts From Range Data , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[149]  Ce Liu,et al.  Depth Extraction from Video Using Non-parametric Sampling , 2012, ECCV.

[150]  Steven M. Drucker,et al.  Quality prediction for image completion , 2012, ACM Trans. Graph..

[151]  Krista A. Ehinger,et al.  Visual features for scene recognition and reorientation , 2013 .

[152]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[153]  Bernt Schiele,et al.  A Semantic Typicality Measure for Natural Scene Categorization , 2004, DAGM-Symposium.

[154]  Jack Bresenham,et al.  Algorithm for computer control of a digital plotter , 1965, IBM Syst. J..

[155]  Jianxiong Xiao,et al.  Localizing 3D cuboids in single-view images , 2012, NIPS.

[156]  Andrew Zisserman,et al.  Get Out of my Picture! Internet-based Inpainting , 2009, BMVC.

[157]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[158]  Reinhard Klein,et al.  Shape Recognition in 3D Point-Clouds , 2008 .

[159]  Dragomir Anguelov,et al.  High quality pose estimation by aligning multiple scans to a latent map , 2010, 2010 IEEE International Conference on Robotics and Automation.

[160]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[161]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[162]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[163]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[164]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[165]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[166]  Yael Pritch,et al.  Shift-map image editing , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[167]  Wei Zhang,et al.  Video Compass , 2002, ECCV.

[168]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[169]  J. Bunge,et al.  Estimating the Number of Species: A Review , 1993 .

[170]  Jianxiong Xiao,et al.  Learning Two-View Stereo Matching , 2008, ECCV.

[171]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[172]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[173]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[174]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[175]  H. Intraub,et al.  Wide-angle memories of close-up scenes. , 1989, Journal of experimental psychology. Learning, memory, and cognition.

[176]  H. C. Longuet-Higgins,et al.  A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[177]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .