The MOPED framework: Object recognition and pose estimation for manipulation

We present MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates single-image and multi-image object recognition and pose estimation in one optimized, robust, and scalable framework. We address two main challenges in computer vision for robotics: robust performance in complex scenes, and low latency for real-time operation. We achieve robust performance with Iterative Clustering Estimation (ICE), a novel algorithm that iteratively combines feature clustering with robust pose estimation. Feature clustering quickly partitions the scene and produces object hypotheses. The hypotheses are used to further refine the feature clusters, and the two steps iterate until convergence. ICE is easy to parallelize, and easily integrates single- and multi-camera object recognition and pose estimation. We also introduce a novel object hypothesis scoring function based on M-estimator theory, and a novel pose clustering algorithm that robustly handles recognition outliers. We achieve scalability and low latency with an improved feature matching algorithm for large databases, a GPU/CPU hybrid architecture that exploits parallelism at all levels, and an optimized resource scheduler. We provide extensive experimental results demonstrating state-of-the-art performance in terms of recognition, scalability, and latency in real-world robotic applications.

[1]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[2]  Vincent Lepetit,et al.  Monocular Model-Based 3D Tracking of Rigid Objects: A Survey , 2005, Found. Trends Comput. Graph. Vis..

[3]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[4]  Alexandru Tupan,et al.  Triangulation , 1997, Comput. Vis. Image Underst..

[5]  Wen-Yan Chang,et al.  On pose recovery for generalized visual sensors , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Peter M. Kogge,et al.  Cache implications of aggressively pipelined high performance microprocessors , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[7]  Clark F. Olson,et al.  Efficient Pose Clustering Using a Randomized Algorithm , 1997, International Journal of Computer Vision.

[8]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[9]  Changchang Wu,et al.  SiftGPU : A GPU Implementation of Scale Invariant Feature Transform (SIFT) , 2007 .

[10]  Michael Allen,et al.  Parallel programming: techniques and applications using networked workstations and parallel computers , 1998 .

[11]  Alvaro Collet,et al.  Making specific features less discriminative to improve point-based 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Geoffrey A. Hollinger,et al.  HERB: a home exploring robotic butler , 2010, Auton. Robots.

[13]  R. Hartley Triangulation, Computer Vision and Image Understanding , 1997 .

[14]  Axel Pinz,et al.  Robust Pose Estimation from a Planar Target , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  William Grimson,et al.  Object recognition by computer - the role of geometric constraints , 1991 .

[17]  Christian Laugier,et al.  The International Journal of Robotics Research (IJRR) - Special issue on ``Field and Service Robotics '' , 2009 .

[18]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[20]  Francesco Zanichelli,et al.  The long and winding road to high-performance image processing with MMX/SSE , 2000, Proceedings Fifth IEEE International Workshop on Computer Architectures for Machine Perception.

[21]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[22]  Siddhartha S. Srinivasa,et al.  Object recognition and full pose registration from a single image for robotic manipulation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[23]  David S. Johnson,et al.  Fast Algorithms for Bin Packing , 1974, J. Comput. Syst. Sci..

[24]  Larry S. Davis,et al.  Model-based object pose in 25 lines of code , 1992, International Journal of Computer Vision.

[25]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[26]  Andrea Salgian,et al.  Appearance-based object recognition using multiple views , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Siddhartha S. Srinivasa,et al.  MOPED: A scalable and low latency object recognition and pose estimation system , 2010, 2010 IEEE International Conference on Robotics and Automation.

[31]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[32]  David G. Lowe,et al.  Three-Dimensional Object Recognition from Single Two-Dimensional Images , 1987, Artif. Intell..

[33]  Luc Van Gool,et al.  Fast scale invariant feature detection and matching on programmable graphics hardware , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[34]  Zhengyou Zhang,et al.  Parameter estimation techniques: a tutorial with application to conic fitting , 1997, Image Vis. Comput..

[35]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[36]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  David G. Lowe,et al.  What and Where: 3D Object Recognition with Accurate Pose , 2006, Toward Category-Level Object Recognition.

[38]  Christian Perwass,et al.  Increasing pose estimation performance using multi-cue integration , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[39]  Ranga Vemuri,et al.  Hardware-software partitioning and pipelined scheduling of transformative applications , 2002, IEEE Trans. Very Large Scale Integr. Syst..

[40]  Siddhartha S. Srinivasa,et al.  Efficient multi-view object recognition and full pose estimation , 2010, 2010 IEEE International Conference on Robotics and Automation.

[41]  Е. А. Краснобаев Распознавание дорожных знаков на изображениях методом Speeded Up Robust Features (SURF) , 2013 .

[42]  Richard Szeliski,et al.  Recovering 3D Shape and Motion from Image Streams Using Nonlinear Least Squares , 1994, J. Vis. Commun. Image Represent..

[43]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[44]  Shree K. Nayar,et al.  A general imaging model and a method for finding its parameters , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.