Deep Object Ranking for Template Matching

Pick-and-place is an important task in robotic manipulation. In industry, template-matching approaches are often used to provide the level of precision required to locate an object to be picked. However, if a robotic workstation is to handle numerous objects, brute-force template-matching becomes expensive, and is subject to notoriously hard-totune thresholds. In this paper, we explore the use of Deep Learning methods to speed up traditional methods such as template matching. In particular, we employed a Single Shot Detection (SSD) and a Residual Network (ResNet) for object detection and classification. Classification scores allows the re-ranking of objects so that template matching is performed in order of likelihood. Tests on a dataset containing 10 industrial objects demonstrated the validity of our approach, by getting an average ranking of 1.37 for the object of interest. Moreover, we tested our approach on the standard Pose dataset which contains 15 objects and got an average ranking of 1.99. Because SSD and ResNet operates essentially in constant time in a Graphics Processor Unit, our approach is able to reach near-constant time execution. We also compared the F1 scores of LINE-2D, a state-of-the-art template matching method, using different strategies (including our own) and the results show that our method is competitive to a brute-force template matching approach. Coupled with near-constant time execution, it therefore opens up the possibility for performing template matching for databases containing hundreds of objects.

[1]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[3]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[4]  King-Sun Fu,et al.  Sequential Methods in Pattern Recognition and Machine Learning , 2012 .

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  Anish Shah,et al.  Deep Residual Networks with Exponential Linear Unit , 2016, ArXiv.

[8]  Francesc Moreno-Noguer,et al.  Matchability Prediction for Full-Search Template Matching Algorithms , 2015, 2015 International Conference on 3D Vision.

[9]  Federico Tombari,et al.  Full-Search-Equivalent Pattern Matching with Incremental Dissimilarity Approximations , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Vincent Lepetit,et al.  Dominant orientation templates for real-time detection of texture-less objects , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Shai Avidan,et al.  FasT-Match: Fast Affine Template Matching , 2013, CVPR.

[12]  Markus Vincze,et al.  Multimodal cue integration through Hypotheses Verification for RGB-D object recognition and 6DOF pose estimation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Brahim Chaib-draa,et al.  Parametric Exponential Linear Unit for Deep Convolutional Neural Networks , 2016, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[15]  Björn Johansson,et al.  Comparison of local image descriptors for full 6 degree-of-freedom pose estimation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[16]  Vincent Lepetit,et al.  Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[18]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[19]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[20]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Federico Tombari,et al.  Fast Full-Search Equivalent Template Matching by Enhanced Bounded Correlation , 2008, IEEE Transactions on Image Processing.

[23]  Tinne Tuytelaars,et al.  Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Vincent Lepetit,et al.  Gradient Response Maps for Real-Time Detection of Textureless Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Vincent Lepetit,et al.  A Novel Representation of Parts for Accurate 3D Object Detection and Tracking in Monocular Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Vincent Lepetit,et al.  Hashmod: A Hashing Method for Scalable 3D Object Detection , 2016, BMVC.

[29]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Nassir Navab,et al.  Rapid selection of reliable templates for visual tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.