What Vision Can, Can’t and Should Do

Computer vision has come a long way since its beginnings. In this chapter, we review some of the recent successes, which seem to indicate that many aspects of vision have indeed been solved and that the way should now be paved for robotic systems that can operate freely in the real world. On closer inspection though that is not the case just yet. A set of specialised solutions in different sub areas, however impressive individually, does not constitute a unified theory of vision. We point out some of the problems of current approaches, most notably lack of abstraction and dealing with uncertainty. Finally, we suggest what research should and should not focus on in order to advance on a broader basis.

[1]  David G. Lowe,et al.  Three-Dimensional Object Recognition from Single Two-Dimensional Images , 1987, Artif. Intell..

[2]  Henrik I. Christensen,et al.  Real-time 3D model-based tracking using edge and keypoint features for robotic manipulation , 2010, 2010 IEEE International Conference on Robotics and Automation.

[3]  Nico Blodow,et al.  Close-range scene segmentation and reconstruction of 3D point cloud maps for mobile manipulation in domestic environments , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Tom Drummond,et al.  ProFORMA: Probabilistic Feature-based On-line Rapid Model Acquisition , 2009, BMVC.

[5]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[6]  Aaron Slomon,et al.  On designing a visual system# (towards a Gibsonian computational model of vision) , 1990 .

[7]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[8]  David L. Waltz,et al.  Understanding Line drawings of Scenes with Shadows , 1975 .

[9]  D. A. Huffman,et al.  Impossible Objects as Nonsense Sentences , 2012 .

[10]  A. Sloman The Computer Revolution in Philosophy: Philosophy, Science, and Models of Mind , 1982 .

[11]  Yunde Jia Description and recognition of curved objects , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol. III. Conference C: Image, Speech and Signal Analysis,.

[12]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[13]  David W. Murray,et al.  Full-3D Edge Tracking with a Particle Filter , 2006, BMVC.

[14]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[15]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[16]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[17]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[18]  Giorgio Metta,et al.  Grounding vision through experimental manipulation , 2003, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[19]  Tamim Asfour,et al.  Autonomous acquisition of visual multi-view object representations for object recognition on a humanoid robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[20]  Markus Vincze,et al.  Predicting the unobservable Visual 3D tracking with a probabilistic motion model , 2011, 2011 IEEE International Conference on Robotics and Automation.

[21]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[22]  Rodney A. Brooks,et al.  Model-Based Three-Dimensional Interpretations of Two-Dimensional Images , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Vincent Lepetit,et al.  Fast Keypoint Recognition in Ten Lines of Code , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Gregory D. Hager,et al.  Scene parsing using a prior world model , 2011, Int. J. Robotics Res..

[25]  Vincent Lepetit,et al.  Feature Harvesting for Tracking-by-Detection , 2006, ECCV.

[26]  James R. Bergen,et al.  Visual odometry for ground vehicle applications , 2006, J. Field Robotics.

[27]  Andrew W. Fitzgibbon,et al.  Multibody Structure and Motion: 3-D Reconstruction of Independently Moving Objects , 2000, ECCV.

[28]  Vincent Lepetit,et al.  Fast Non-Rigid Surface Detection, Registration and Realistic Augmentation , 2008, International Journal of Computer Vision.

[29]  R. Pfeifer,et al.  Connectionism in Perspective , 1989 .

[30]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Danica Kragic,et al.  Birth of the Object: Detection of Objectness and Extraction of Object Shape through Object-Action complexes , 2008, Int. J. Humanoid Robotics.

[32]  Jan Mayer,et al.  A numerical evaluation of preprocessing and ILU-type preconditioners for the solution of unsymmetric sparse linear systems using iterative methods , 2009, TOMS.

[33]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[34]  Danica Kragic,et al.  Active 3D scene segmentation and detection of unknown objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[35]  Vincent Lepetit,et al.  A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Takeo Kanade,et al.  GPU-accelerated real-time 3D tracking for humanoid locomotion and stair climbing , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[38]  Sven J. Dickinson,et al.  Object Categorization: The Evolution of Object Categorization and the Challenge of Image Abstraction , 2009 .

[39]  F. R. A. Hopgood,et al.  Machine Intelligence 6 , 1972, The Mathematical Gazette.

[40]  Markus Vincze,et al.  Model-based 3D object detection , 2010, Machine Vision and Applications.

[41]  Mohan M. Trivedi,et al.  Particle filtering with rendered models: A two pass approach to multi-object 3D tracking with the GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[42]  Helge J. Ritter,et al.  Real-time 3D segmentation of cluttered scenes for robot grasping , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[43]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[44]  Azriel Rosenfeld,et al.  3-D Shape Recovery Using Distributed Aspect Matching , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  M. Vincze,et al.  BLORT-The Blocks World Robotic Vision Toolbox , 2010 .

[46]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[48]  Siddhartha S. Srinivasa,et al.  Object recognition and full pose registration from a single image for robotic manipulation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[49]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[50]  Markus Vincze,et al.  Segmentation of unknown objects in indoor environments , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[51]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Bastian Leibe,et al.  Interleaved Object Categorization and Segmentation , 2003, BMVC.

[53]  David Marr,et al.  VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .

[54]  Andrew J. Davison,et al.  Live dense reconstruction with a single moving camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[55]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[56]  Luc Van Gool,et al.  Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views , 2006, International Journal of Computer Vision.

[57]  Simon J. Thorpe,et al.  BIOLOGICAL CONSTRAINTS ON CONNECTIONIST MODELLING , 2015 .

[58]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[59]  M. B. Clowes,et al.  On Seeing Things , 1971, Artif. Intell..

[60]  David G. Lowe,et al.  What and Where: 3D Object Recognition with Accurate Pose , 2006, Toward Category-Level Object Recognition.

[61]  Vincent Lepetit,et al.  Monocular Model-based 3d Tracking of Rigid Objects (Foundations and Trends in Computer Graphics and Vision(R)) , 2005 .

[62]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[63]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[64]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[65]  Manolis I. A. Lourakis,et al.  SBA: A software package for generic sparse bundle adjustment , 2009, TOMS.

[66]  H. Intraub Rapid conceptual identification of sequentially presented pictures. , 1981 .

[67]  Luc Van Gool,et al.  Multibody Structure-from-Motion in Practice , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Diego Borro,et al.  Towards real time 3D tracking and reconstruction on a GPU using Monte Carlo simulations , 2010, 2010 IEEE International Symposium on Mixed and Augmented Reality.