Interpreting the structure of single images by learning from examples

An important problem in computer vision is the interpretation of the content of a single image. In our work we investigated the challenging case of recovering the underlying 3D structure of a scene from a single image, by learning from trainig data. Toward this, we developed a plane detection algorithm, which is able to find planar surfaces in a single still image and estimate their orientation with respect to the camera. This comprises two parts: a plane recognition stage, to classify individual regions as being planar or not, and to estimate their orienation; followed by a Markov-random field based segmentation stage to find distinct planes in the image. We also demonstrated an application of this to visual odometry, where single-image plane detection allows structure-rich maps to be built quickly.  (Please note that this abstract does not appear in the submitted article itself, since that is itself an extended thesis abstract! But the above describes the main points of our work as described in our submission.)

[1]  Ian D. Reid,et al.  RSLAM: A System for Large-Scale Mapping in Constant-Time Using Stereo , 2011, International Journal of Computer Vision.

[2]  M Parsley,et al.  SLAM with a Heterogeneous Prior Map , 2009 .

[3]  Hauke Strasdat,et al.  Scalable active matching , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Dorin Comaniciu,et al.  An Algorithm for Data-Driven Bandwidth Selection , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  David F. Fouhey,et al.  Multiple Plane Detection in Image Pairs Using J-Linkage , 2010, 2010 20th International Conference on Pattern Recognition.

[7]  David A. Forsyth,et al.  Shape from texture and integrability , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Jana Kosecka,et al.  Detection and matching of rectilinear structures , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[10]  Javier Civera,et al.  Inverse Depth Parametrization for Monocular SLAM , 2008, IEEE Transactions on Robotics.

[11]  Éric Gaussier,et al.  Relation between PLSA and NMF and implications , 2005, SIGIR '05.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  J. Gibson The Ecological Approach to the Visual Perception of Pictures , 1978 .

[14]  Damian M. Lyons Sharing and fusing landmark information in a team of autonomous robots , 2009, Defense + Commercial Sensing.

[15]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[16]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[17]  Danica Kragic,et al.  Receptive field cooccurrence histograms for object detection , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Henrik I. Christensen,et al.  Multiple Plane Segmentation Using Optical Flow , 2002, BMVC.

[19]  J. Hertzberg,et al.  Matching CAD Object Models in Semantic Mapping , 2011 .

[20]  Javier Civera,et al.  1-point RANSAC for EKF-based Structure from Motion , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Antonio Torralba,et al.  Scene-Centered Description from Spatial Envelope Properties , 2002, Biologically Motivated Computer Vision.

[22]  In-So Kweon,et al.  Object recognition using a generalized robust invariant feature and Gestalt's law of proximity and similarity , 2008, Pattern Recognit..

[23]  Seungjin Choi,et al.  Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds , 2008, IDEAL.

[24]  Michel Dhome,et al.  Three-dimensional reconstruction by zooming , 1993, IEEE Trans. Robotics Autom..

[25]  R. Gregory Perceptions as hypotheses. , 1980, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[26]  Tobias Pietzsch Planar Features for Visual SLAM , 2008, KI.

[27]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[28]  Neill W Campbell,et al.  Augmentation of Sparsely Populated Point Clouds using Planar Intersection , 2004 .

[29]  Patrick Rives,et al.  An Efficient Direct Approach to Visual SLAM , 2008, IEEE Transactions on Robotics.

[30]  Walterio W. Mayol-Cuevas,et al.  Real-Time and Robust Monocular SLAM Using Predictive Multi-resolution Descriptors , 2006, ISVC.

[31]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, ACM Trans. Graph..

[32]  Chris H. Q. Ding,et al.  On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing , 2008, Comput. Stat. Data Anal..

[33]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[34]  Jorma Laaksonen,et al.  Spatial extensions to bag of visual words , 2009, CIVR '09.

[35]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36]  Walterio W. Mayol-Cuevas,et al.  Discovering Higher Level Structure in Visual SLAM , 2008, IEEE Transactions on Robotics.

[37]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[38]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.

[39]  James J. Gibson,et al.  The Information Available in Pictures , 1971 .

[40]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[41]  Ian D. Reid,et al.  Growing semantically meaningful models for visual SLAM , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[43]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[44]  Roland Siegwart,et al.  Real-time monocular visual odometry for on-road vehicles with 1-point RANSAC , 2009, 2009 IEEE International Conference on Robotics and Automation.

[45]  Pietro Perona,et al.  Efficient methods for object recognition using the constellation model , 2001 .

[46]  Markus Vincze,et al.  Towards detection of orthogonal planes in monocular images of indoor environments , 2008, 2008 IEEE International Conference on Robotics and Automation.

[47]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[48]  Stanley T. Birchfield,et al.  Spatiograms versus histograms for region-based tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[49]  Javier Civera,et al.  Towards semantic SLAM using a monocular camera , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[50]  Jonas Gårding,et al.  Direct Estimation of Shape from Texture , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Andrew Calway,et al.  Efficiently Increasing Map Density in Visual SLAM Using Planar Features with Adaptive Measurement , 2009, BMVC.

[52]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from a Single Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[53]  Gang Chen,et al.  A Lie group based spatiogram similarity measure , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[54]  Andrew Calway,et al.  Appearance Based Extraction of Planar Structure in Monocular SLAM , 2009, SCIA.

[55]  Lisa M. Brown,et al.  Surface orientation from projective foreshortening of isotropic texture autocorrelation , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[56]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[57]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[58]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[59]  Andrew Calway,et al.  Estimating Planar Structure in Single Images by Learning from Examples , 2012, ICPRAM.

[60]  Andrew Calway,et al.  Real-Time Camera Tracking Using Known 3D Models and a Particle Filter , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[61]  David W. Murray,et al.  Full-3D Edge Tracking with a Particle Filter , 2006, BMVC.

[62]  Stefano Soatto,et al.  A geometric approach to shape from defocus , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[64]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[65]  Alain Trémeau,et al.  A region growing and merging algorithm to color segmentation , 1997, Pattern Recognit..

[66]  Ashutosh Saxena,et al.  Depth Estimation Using Monocular and Stereo Cues , 2007, IJCAI.

[67]  Andrew Calway,et al.  Visual mapping using learned structural priors , 2013, 2013 IEEE International Conference on Robotics and Automation.

[68]  A. U.S.,et al.  Recovering Surface Shape and Orientation from Texture , 2002 .

[69]  T. Lindeberg,et al.  Scale-Space Theory : A Basic Tool for Analysing Structures at Different Scales , 1994 .

[70]  R. Gregory,et al.  Border Locking and the Café Wall Illusion , 1979, Perception.

[71]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[72]  Hauke Strasdat,et al.  Real-time monocular SLAM: Why filter? , 2010, 2010 IEEE International Conference on Robotics and Automation.

[73]  Hisashi Shimodaira,et al.  A shape-from-shading method of polyhedral objects using prior information , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  Andrew Calway,et al.  Recognising Planes in a Single Image , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[76]  Stan Matwin,et al.  Feature Engineering for Text Classification , 1999, ICML.

[77]  Ian D. Reid,et al.  Locally Planar Patch Features for Real-Time Structure from Motion , 2004, BMVC.

[78]  Seungjin Choi,et al.  Algorithms for orthogonal nonnegative matrix factorization , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[79]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[80]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[81]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[82]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[83]  Walterio W. Mayol-Cuevas,et al.  Ninja on a Plane: Automatic Discovery of Physical Planes for Augmented Reality Using Visual SLAM , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[84]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[85]  Paul Newman,et al.  Highly scalable appearance-only SLAM - FAB-MAP 2.0 , 2009, Robotics: Science and Systems.

[86]  Edwin R. Hancock,et al.  Estimating the 3D orientation of texture planes using local spectral analysis , 2000, Image Vis. Comput..

[87]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[88]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[89]  Martial Hebert,et al.  Toward Objective Evaluation of Image Segmentation Algorithms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[91]  Edwin R. Hancock,et al.  Estimating the perspective pose of texture planes using spectral analysis on the unit sphere , 2002, Pattern Recognit..

[92]  Antonio Torralba,et al.  Depth Estimation from Image Structure , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[93]  R. Hetherington The Perception of the Visual World , 1952 .

[94]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[95]  Alan C. Bovik,et al.  Planar surface orientation from texture spatial frequencies , 1995, Pattern Recognit..

[96]  Björn Stenger,et al.  Multivariate Relevance Vector Machines for Tracking , 2006, ECCV.

[97]  Peter N. Belhumeur,et al.  Tom-vs-Pete Classifiers and Identity-Preserving Alignment for Face Verification , 2012, BMVC.

[98]  Somkiat Wangsiripitak,et al.  Reducing mismatching under time-pressure by reasoning about visibility and occlusion , 2010, BMVC.

[99]  Luc De Raedt,et al.  A Relational Distance-based Framework for Hierarchical Image Understanding , 2012, ICPRAM.

[100]  Andrew Calway,et al.  Efficient visual odometry using a structure-driven temporal map , 2012, 2012 IEEE International Conference on Robotics and Automation.

[101]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[102]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[103]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104]  Andrew Calway,et al.  Unifying Planar and Point Mapping in Monocular SLAM , 2010, BMVC.

[105]  Wei Zhang,et al.  Extraction, matching and pose recovery based on dominant rectangular structures , 1989, First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003. HLK 2003..

[106]  David W. Murray,et al.  Combining monoSLAM with object recognition for scene augmentation using a wearable camera , 2010, Image Vis. Comput..

[107]  P. Dorninger,et al.  3 D SEGMENTATION OF UNSTRUCTURED POINT CLOUDS FOR BUILDING MODELLING , 2007 .

[108]  R. Gregory,et al.  Knowledge in perception and illusion. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[109]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[110]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[111]  Andrew Calway,et al.  Detecting planes and estimating their orientation from a single image , 2012, BMVC.

[112]  Antonio Torralba,et al.  Semantic organization of scenes using discriminant structural templates , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[113]  Hauke Strasdat,et al.  Scale Drift-Aware Large Scale Monocular SLAM , 2010, Robotics: Science and Systems.

[114]  Azriel Rosenfeld,et al.  Robust regression methods for computer vision: A review , 1991, International Journal of Computer Vision.

[115]  Naiming Qi,et al.  On Vocabulary Size in Bag-of-Visual-Words Representation , 2010, PCM.

[116]  Antonio Torralba,et al.  Depth from Familiar Objects: A Hierarchical Model for 3D Scenes , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[117]  Allan Hanbury,et al.  Co-occurrence Bag of Words for Object Recognition , 2010 .

[118]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[119]  Carl E. Rasmussen,et al.  Healing the relevance vector machine through augmentation , 2005, ICML.

[120]  John Oliensis,et al.  Uniqueness in shape from shading , 1991, International Journal of Computer Vision.

[121]  Alan F. Smeaton,et al.  An Improved Spatiogram Similarity Measure for Robust Object Localisation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[122]  Jonas Gårding,et al.  Shape from texture for smooth curved surfaces in perspective projection , 1992, Journal of Mathematical Imaging and Vision.

[123]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[124]  KeeChang Lee,et al.  Fast Automatic Single-View 3-d Reconstruction of Urban Scenes , 2008, ECCV.

[125]  Adrien Bartoli,et al.  A random sampling strategy for piecewise planar scene segmentation , 2007, Comput. Vis. Image Underst..

[126]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[127]  Georgios Tziritas,et al.  Single view reconstruction using shape grammars for urban environments , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[128]  Dorin Comaniciu,et al.  The Variable Bandwidth Mean Shift and Data-Driven Scale Selection , 2001, ICCV.

[129]  Richard N. Aslin,et al.  Bayesian model learning in human visual perception , 2005, NIPS.

[130]  Walterio W. Mayol-Cuevas,et al.  Discovering Planes and Collapsing the State Space in Visual SLAM , 2007, BMVC.