Viewpoint selection for visual failure detection

The visual difference between outcomes in many robotics tasks is often subtle, such as the tip of a screw being near a hole versus in the hole. Furthermore, these small differences are often only observable from certain viewpoints or may even require information from multiple viewpoints to fully verify. We introduce and compare three approaches to selecting viewpoints for verifying successful execution of tasks: (1) a random forest-based method that discovers highly informative fine-grained visual features, (2) SVM models trained on features extracted from pre-trained convolutional neural networks, and (3) an active, hybrid approach that uses the above methods for two-stage multi-viewpoint classification. These approaches are experimentally validated on an IKEA furniture assembly task and a quadrotor surveillance domain.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[3]  Stefan Schaal,et al.  Towards Associative Skill Memories , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[4]  Gaurav S. Sukhatme,et al.  Sensor fault detection and identification in a mobile robot , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[5]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Nikolaos Papanikolopoulos,et al.  Mobile camera positioning to optimize the observability of human activity recognition tasks , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Joseph R. Cavallaro,et al.  Robotic fault detection and fault tolerance: A survey , 1994 .

[9]  M. Ani Hsieh,et al.  Distributed assembly with online workload balancing and visual error detection and correction , 2014, Int. J. Robotics Res..

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Danica Kragic,et al.  ST-HMP: Unsupervised Spatio-Temporal feature learning for tactile data , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Hideki Hashimoto,et al.  Self-organizing visual servo system based on neural networks , 1992 .

[13]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[14]  Benjamin Naumann,et al.  Learning And Soft Computing Support Vector Machines Neural Networks And Fuzzy Logic Models , 2016 .

[15]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[16]  Jaydev P. Desai,et al.  Combining haptic and visual servoing for cardiothoracic surgery , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[17]  M. Gini,et al.  Visual servoing of a miniature robot toward a marked target , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[18]  Vojislav Kecman,et al.  Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models , 2001 .

[19]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[20]  Scott Niekum,et al.  Incremental Semantically Grounded Learning from Demonstration , 2013, Robotics: Science and Systems.

[21]  Siddhartha S. Srinivasa,et al.  The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[22]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[23]  F. Nicolls,et al.  Active object recognition using vocabulary trees , 2013, 2013 IEEE Workshop on Robot Vision (WORV).

[24]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[25]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[26]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[27]  Ayellet Tal,et al.  Surface Regions of Interest for Viewpoint Selection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Charles C. Kemp,et al.  Autonomously learning to visually detect where manipulation will succeed , 2012, Auton. Robots.

[29]  Wolfram Burgard,et al.  Efficient Failure Detection on Mobile Robots Using Particle Filters with Gaussian Process Proposals , 2007, IJCAI.

[30]  Christian Laugier,et al.  Automatic camera placement for robot vision tasks , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[31]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  B. Benhabib,et al.  Optimal camera placement for an active-vision system , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.

[33]  Tianbao Yang,et al.  Hyper-class augmented and regularized deep learning for fine-grained image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Christian Schlegel,et al.  Information driven sensor placement for robust active object recognition based on multiple views , 2012, 2012 IEEE International Conference on Technologies for Practical Robot Applications (TePRA).

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[38]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.

[39]  Richard Pito,et al.  A sensor-based solution to the "next best view" problem , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[40]  Joachim Denzler,et al.  An Information Theoretic Approach for Next Best View Planning in 3-D Reconstruction , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[41]  Gert Kootstra,et al.  Active exploration and keypoint clustering for object recognition , 2008, 2008 IEEE International Conference on Robotics and Automation.

[42]  Toshio Fukuda,et al.  Error recovery in the assembly of a Self-Organizing Manipulator by using active visual and force sensing , 1995, Auton. Robots.