Using surfaces and surface relations in an Early Cognitive Vision system

We present a deep hierarchical visual system with two parallel hierarchies for edge and surface information. In the two hierarchies, complementary visual information is represented on different levels of granularity together with the associated uncertainties and confidences. At all levels, geometric and appearance information is coded explicitly in 2D and 3D allowing to access this information separately and to link between the different levels. We demonstrate the advantages of such hierarchies in three applications covering grasping, viewpoint independent object representation, and pose estimation.

[1]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[2]  S. Schultz Principles of Neural Science, 4th ed. , 2001 .

[3]  Michael Felsberg,et al.  Continuous dimensionality characterization of image structures , 2009, Image Vis. Comput..

[4]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Don Ray Murray,et al.  Patchlets: Representing Stereo Vision Data with Surface Elements , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[6]  Steve Fotios,et al.  Measuring Colour , 2013 .

[7]  Justus H. Piater,et al.  Development of Object and Grasping Knowledge by Robot Exploration , 2010, IEEE Transactions on Autonomous Mental Development.

[8]  John K. Tsotsos A Computational Perspective on Visual Attention , 2011 .

[9]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[10]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[11]  Sinan Kalkan,et al.  Using multi-modal 3D contours and their relations for vision and robotics , 2010, J. Vis. Commun. Image Represent..

[12]  Rüdiger Dillmann,et al.  The KIT object models database: An object model database for object recognition, localization and manipulation in service robotics , 2012, Int. J. Robotics Res..

[13]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[14]  Dirk Kraft,et al.  Real-time extraction of surface patches with associated uncertainties by means of Kinect cameras , 2012, Journal of Real-Time Image Processing.

[15]  Justus H. Piater,et al.  A Probabilistic Framework for 3D Visual Object Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  A. Pressley Elementary Differential Geometry , 2000 .

[17]  Hans Knutsson,et al.  Signal processing for computer vision , 1994 .

[18]  P. Milner,et al.  The functional nature of neuronal oscillations , 1992, Trends in Neurosciences.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Dirk Kraft,et al.  Pose Estimation using a Hierarchical 3D Representation of Contours and Surfaces , 2013, VISAPP.

[21]  Danica Kragic,et al.  Early Cognitive Vision as a Frontend for Cognitive Systems , 2010 .

[22]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[23]  John K. Tsotsos Analyzing vision at the complexity level , 1990, Behavioral and Brain Sciences.

[24]  Geoffrey E. Hinton,et al.  Learning Multilevel Distributed Representations for High-Dimensional Sequences , 2007, AISTATS.

[25]  I. Biederman,et al.  Dynamic binding in a neural network for shape recognition. , 1992, Psychological review.

[26]  Florentin Wörgötter,et al.  Visual Primitives: Local, Condensed, Semantically Rich Visual Descriptors and their Applications in Robotics , 2010, Int. J. Humanoid Robotics.

[27]  Florentin Wörgötter,et al.  Statistical Analysis of Local 3D Structure in 2D Images , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[29]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Danica Kragic,et al.  Birth of the Object: Detection of Objectness and Extraction of Object Shape through Object-Action complexes , 2008, Int. J. Humanoid Robotics.

[31]  Dirk Kraft,et al.  Accumulation of Different Visual Feature Descriptors in a Coherent Framework , 2011, SCIA.

[32]  Jimmy A. Jørgensen,et al.  Enabling grasping of unknown objects through a synergistic use of edge and surface information , 2012, Int. J. Robotics Res..

[33]  Danica Kragic,et al.  A strategy for grasping unknown objects based on co-planarity and colour information , 2010, Robotics Auton. Syst..

[34]  Yann LeCun,et al.  Large-scale Learning with SVM and Convolutional for Generic Object Categorization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  M. Aminoff Principles of Neural Science. 4th edition , 2001 .

[36]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[37]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[38]  Eric Wahl,et al.  Surflet-pair-relation histograms: a statistical 3D-shape representation for rapid classification , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[39]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Shawn D. Newsam,et al.  Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[41]  Andrew Gilbert,et al.  Action Recognition Using Mined Hierarchical Compound Features , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Henrik Gordon Petersen,et al.  Pose estimation using local structure-specific shape and appearance context , 2013, 2013 IEEE International Conference on Robotics and Automation.

[43]  Sanja Fidler,et al.  Object Categorization: Learning Hierarchical Compositional Representations of Object Structure , 2009 .

[44]  D. Hubel,et al.  Anatomical Demonstration of Columns in the Monkey Striate Cortex , 1969, Nature.

[45]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[47]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[48]  Joachim M. Buhmann,et al.  Learning the Compositional Nature of Visual Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Shimon Ullman,et al.  Visual Classification by a Hierarchy of Extended Fragments , 2006, Toward Category-Level Object Recognition.

[50]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[51]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Nicolas Pinto,et al.  Comparing state-of-the-art visual features on invariant object recognition tasks , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[53]  Sinan Kalkan,et al.  Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision? , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Norbert Krüger,et al.  Multi-view object recognition using view-point invariant shape relations and appearance information , 2013, 2013 IEEE International Conference on Robotics and Automation.

[56]  Silvio Savarese,et al.  Discriminative Object Class Models of Appearance and Shape by Correlatons , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[57]  Henrik Gordon Petersen,et al.  Industrial Robot : An International Journal Ring on the hook : placing a ring on a moving and pendulating hook based on visual input , 2016 .

[58]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[59]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[60]  Takayuki Ito,et al.  Neocognitron: A neural network model for a mechanism of visual pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[61]  Dirk Kraft,et al.  Extended 3D Line Segments from RGB-D Data for Pose Estimation , 2013, SCIA.

[62]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[63]  Sven J. Dickinson,et al.  Object Categorization: The Evolution of Object Categorization and the Challenge of Image Abstraction , 2009 .

[64]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[65]  Luc Van Gool,et al.  Efficient Mining of Frequent and Distinctive Feature Configurations , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[66]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[67]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[68]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[69]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[70]  Marc M. Van Hulle,et al.  A two-level real-time vision machine combining coarse- and fine-grained parallelism , 2010, Journal of Real-Time Image Processing.

[71]  Bartlett W. Mel,et al.  Minimizing Binding Errors Using Learned Conjunctive Features , 2000, Neural Computation.

[72]  Michael Felsberg,et al.  The monogenic signal , 2001, IEEE Trans. Signal Process..

[73]  Florentin Wörgötter,et al.  Multi-modal Primitives as Functional Models of Hyper-columns and Their Use for Contextual Integration , 2005, BVAI.

[74]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[75]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[76]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[77]  Dhiraj Joshi,et al.  Object Categorization: Computer and Human Vision Perspectives , 2008 .

[78]  Nico Blodow,et al.  Aligning point cloud views using persistent feature histograms , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[79]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[80]  M. Goodale,et al.  Separate visual pathways for perception and action , 1992, Trends in Neurosciences.

[81]  Peter Kovesi,et al.  Image Features from Phase Congruency , 1995 .

[82]  安藤 広志,et al.  20世紀の名著名論:David Marr:Vision:a Computational Investigation into the Human Representation and Processing of Visual Information , 2005 .

[83]  Yann LeCun,et al.  Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[84]  Nicolas Pinto,et al.  How far can you get with a modern face recognition test set using only simple features? , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[86]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[87]  Bernt Schiele,et al.  3D object recognition from range images using local feature histograms , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[88]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[89]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[90]  Sanja Fidler,et al.  A Coarse-to-Fine Taxonomy of Constellations for Fast Multi-class Object Detection , 2010, ECCV.