Dual-Layer Density Estimation for Multiple Object Instance Detection

This paper introduces a dual-layer density estimation-based architecture for multiple object instance detection in robot inventory management applications. The approach consists of raw scale-invariant feature transform (SIFT) feature matching and key point projection. The dominant scale ratio and a reference clustering threshold are estimated using the first layer of the density estimation. A cascade of filters is applied after feature template reconstruction and refined feature matching to eliminate false matches. Before the second layer of density estimation, the adaptive threshold is finalized by multiplying an empirical coefficient for the reference value. The coefficient is identified experimentally. Adaptive threshold-based grid voting is applied to find all candidate object instances. Error detection is eliminated using final geometric verification in accordance with Random Sample Consensus (RANSAC). The detection results of the proposed approach are evaluated on a self-built dataset collected in a supermarket. The results demonstrate that the approach provides high robustness and low latency for inventory management application.

[1]  Siddhartha S. Srinivasa,et al.  The MOPED framework: Object recognition and pose estimation for manipulation , 2011, Int. J. Robotics Res..

[2]  Winston H. Hsu,et al.  Large-scale simultaneous multi-object recognition and localization via bottom up search-based approach , 2012, ACM Multimedia.

[3]  Kota Iwamoto,et al.  Multiple object identification using grid voting of object center estimated from keypoint matches , 2013, 2013 IEEE International Conference on Image Processing.

[4]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[5]  Luo Juan,et al.  A comparison of SIFT, PCA-SIFT and SURF , 2009 .

[6]  H. Läuter,et al.  Silverman, B. W.: Density Estimation for Statistics and Data Analysis. Chapman & Hall, London – New York 1986, 175 pp., £12.— , 1988 .

[7]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[8]  Zhang Jin SIFT matching method based on base scale transformation , 2014 .

[9]  Richard Szeliski,et al.  Recovering 3D Shape and Motion from Image Streams Using Nonlinear Least Squares , 1994, J. Vis. Commun. Image Represent..

[10]  D. Lowe,et al.  Fast Matching of Binary Features , 2012, 2012 Ninth Conference on Computer and Robot Vision.

[11]  Yasemin Yardimci,et al.  Improved SIFT matching for image pairs with scale difference , 2010 .

[12]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[13]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[14]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[15]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[16]  J. Paul Siebert,et al.  Unsupervised clustering in Hough space for recognition of multiple instances of the same object in a cluttered scene , 2010, Pattern Recognit. Lett..

[17]  Michael F. Cohen,et al.  Image and Video Matting: A Survey , 2007, Found. Trends Comput. Graph. Vis..

[18]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Vincent Lepetit,et al.  Online learning of patch perspective rectification for efficient object detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[21]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Martial Hebert,et al.  Rapid object indexing using locality sensitive hashing and joint 3D-signature space estimation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Winston H. Hsu,et al.  Multiple object localization by context-aware adaptive window search and search-based object recognition , 2011, MM '11.

[24]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Kota Iwamoto,et al.  Local Feature Based Multiple Object Instance Identification Using Scale and Rotation Invariant Implicit Shape Model , 2014, ACCV Workshops.

[28]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[29]  Manuela M. Veloso,et al.  Detection and Localization of Multiple Objects , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.