BIK-BUS: Biologically Motivated 3D Keypoint Based on Bottom-Up Saliency

One of the major problems found when developing a 3D recognition system involves the choice of keypoint detector and descriptor. To help solve this problem, we present a new method for the detection of 3D keypoints on point clouds and we perform benchmarking between each pair of 3D keypoint detector and 3D descriptor to evaluate their performance on object and category recognition. These evaluations are done in a public database of real 3D objects. Our keypoint detector is inspired by the behavior and neural architecture of the primate visual system. The 3D keypoints are extracted based on a bottom-up 3D saliency map, that is, a map that encodes the saliency of objects in the visual environment. The saliency map is determined by computing conspicuity maps (a combination across different modalities) of the orientation, intensity, and color information in a bottom-up and in a purely stimulus-driven manner. These three conspicuity maps are fused into a 3D saliency map and, finally, the focus of attention (or keypoint location) is sequentially directed to the most salient points in this map. Inhibiting this location automatically allows the system to attend to the next most salient location. The main conclusions are: with a similar average number of keypoints, our 3D keypoint detector outperforms the other eight 3D keypoint detectors evaluated by achieving the best result in 32 of the evaluated metrics in the category and object recognition experiments, when the second best detector only obtained the best result in eight of these metrics. The unique drawback is the computational time, since biologically inspired 3D keypoint based on bottom-up saliency is slower than the other detectors. Given that there are big differences in terms of recognition performance, size and time requirements, the selection of the keypoint detector and descriptor has to be matched to the desired task and we give some directions to facilitate this choice.

[1]  M. Cannon,et al.  A model for inhibitory lateral interaction effects in perceived contrast , 1996, Vision Research.

[2]  Anton van den Hengel,et al.  Thrift: Local 3D Structure Recognition , 2007, 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications (DICTA 2007).

[3]  Nico Blodow,et al.  Aligning point cloud views using persistent feature histograms , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  Pietro Perona,et al.  Overcomplete steerable pyramid filters and rotation invariance , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Yu Zhong,et al.  Intrinsic shape signatures: A shape descriptor for 3D object recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[7]  S. Engel,et al.  Colour tuning in human visual cortex measured with functional magnetic resonance imaging , 1997, Nature.

[8]  Federico Tombari,et al.  Unique shape context for 3d data description , 2010, 3DOR '10.

[9]  Zoltan-Csaba Marton,et al.  Tutorial: Point Cloud Library: Three-Dimensional Object Recognition and 6 DOF Pose Estimation , 2012, IEEE Robotics & Automation Magazine.

[10]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[11]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[12]  Federico Tombari,et al.  A combined texture-shape descriptor for enhanced 3D feature matching , 2011, 2011 18th IEEE International Conference on Image Processing.

[13]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[14]  Anton van den Hengel,et al.  Thrift : Local 3 D Structure Recognition , 2007 .

[15]  Markus Vincze,et al.  OUR-CVFH - Oriented, Unique and Repeatable Clustered Viewpoint Feature Histogram for Object Recognition and 6DOF Pose Estimation , 2012, DAGM/OAGM Symposium.

[16]  A. Leventhal The neural basis of visual function , 1991 .

[17]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[18]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[19]  Gary R. Bradski,et al.  Fast 3D recognition and pose using the Viewpoint Feature Histogram , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Federico Tombari,et al.  Performance Evaluation of 3D Keypoint Detectors , 2012, International Journal of Computer Vision.

[21]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[22]  Luís A. Alexandre,et al.  A comparative evaluation of 3D keypoint detectors in a RGB-D Object Dataset , 2015, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[23]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[24]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[25]  Luís A. Alexandre 3D Descriptors for Object and Category Recognition: a Comparative Evaluation , 2012 .

[26]  John Daugman,et al.  High Confidence Visual Recognition of Persons by a Test of Statistical Independence , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Nico Blodow,et al.  CAD-model recognition and 6DOF pose estimation using 3D cues , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[28]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[29]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[30]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[31]  Luís A. Alexandre Set Distance Functions for 3D Object Recognition , 2013, CIARP.

[32]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[33]  Bernard Chazelle,et al.  Matching 3D models with shape distributions , 2001, Proceedings International Conference on Shape Modeling and Applications.

[34]  Eric Wahl,et al.  Surflet-pair-relation histograms: a statistical 3D-shape representation for rapid classification , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[35]  Jitendra Malik,et al.  Recognizing Objects in Range Data Using Regional Point Descriptors , 2004, ECCV.

[36]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Markus Vincze,et al.  Ensemble of shape functions for 3D object classification , 2011, 2011 IEEE International Conference on Robotics and Biomimetics.

[38]  Nico Blodow,et al.  Fast geometric point labeling using conditional random fields , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  S. M. Steve SUSAN - a new approach to low level image processing , 1997 .

[40]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.