A fast object recognition and categorization technique for robot grasping using the visual bag of words

We present in this paper a real-time method for visual categorization to do robot grasping. We describe an object database with SURF feature points which we quantify with the Kmeans clustering algorithm to make visual words. Then, we train a Support Vector Machine classifier having as entries the distribution of the bag of features extracted earlier. Likewise, we do object recognition using the SVM algorithm. The real-time implementation is done with the OpenCV GPU. The application aims to pick up an object and drop it using our robot manipulator equipped with a camera using our visual system. Finally, the results of our experiments of the object recognition show an average of recognition between 95% and 100%.

[1]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[5]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[6]  Fahad Shahbaz Khan,et al.  The Impact of Color on Bag-of-Words Based Object Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Nenghai Yu,et al.  Semantics-Preserving Bag-of-Words Models and Applications , 2010, IEEE Transactions on Image Processing.

[9]  Wolfgang Förstner,et al.  Detecting interpretable and accurate scale-invariant keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[11]  Lei Zhu,et al.  Theory of keyblock-based image retrieval , 2002, TOIS.

[12]  Ioannis Pratikakis,et al.  Bag of spatio-visual words for context inference in scene classification , 2013, Pattern Recognit..

[13]  Jian-Yun Nie,et al.  Using Language Models for Text Classification , 2004 .

[14]  Cécile Barat,et al.  Spatial orientations of visual word pairs to improve Bag-of-Visual-Words model , 2012, BMVC.

[15]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[16]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[18]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[19]  Frédéric Jurie,et al.  Category Level Object Segmentation by Combining Bag-of-Words Models with Dirichlet Processes and Random Fields , 2010, International Journal of Computer Vision.

[20]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[21]  Kieran Richard McDonald,et al.  Discrete language models for video retrieval , 2005 .

[22]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[23]  Nenghai Yu,et al.  Visual language modeling for image classification , 2007, MIR '07.

[24]  Ioannis Kompatsiaris,et al.  GPU acceleration for support vector machines , 2011, WIAMIS 2011.

[25]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[26]  Tao Mei,et al.  Contextual Bag-of-Words for Visual Categorization , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  David Filliat,et al.  A visual bag of words method for interactive qualitative localization and mapping , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[28]  Arnold W. M. Smeulders,et al.  The Amsterdam Library of Object Images , 2004, International Journal of Computer Vision.

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[31]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[32]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..