Introspective perception: Learning to predict failures in vision systems

As robots aspire for long-term autonomous operations in complex dynamic environments, the ability to reliably take mission-critical decisions in ambiguous situations becomes critical. This motivates the need to build systems that have situational awareness to assess how qualified they are at that moment to make a decision. We call this self-evaluating capability as introspection. In this paper, we take a small step in this direction and propose a generic framework for introspective behavior in perception systems. Our goal is to learn a model to reliably predict failures in a given system, with respect to a task, directly from input sensor data. We present this in the context of vision-based autonomous MAV flight in outdoor natural environments, and show that it effectively handles uncertain situations.

[1]  R. C. Coulter,et al.  Implementation of the Pure Pursuit Path Tracking Algorithm , 1992 .

[2]  Robert P. W. Duin,et al.  Classifier Conditional Posterior Probabilities , 1998, SSPR/SPR.

[3]  Padraig Cunningham,et al.  Generating Estimates of Classification Confidence for a Case-Based Spam Filter , 2005, ICCBR.

[4]  Alonzo Kelly,et al.  Toward Reliable Off Road Autonomous Vehicles Operating in Challenging Environments , 2006, Int. J. Robotics Res..

[5]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[6]  Alonzo Kelly,et al.  Optimal Sampling In the Space of Paths: Preliminary Results , 2006 .

[7]  Bruno Mirbach,et al.  Confidence Estimation in Classification Decision: A Method for Detecting Unseen Patterns , 2006 .

[8]  Yann LeCun,et al.  Large-scale Learning with SVM and Convolutional for Generic Object Categorization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  William Whittaker,et al.  Robotic introspection for exploration and mapping of subterranean environments , 2007 .

[10]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML '08.

[11]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[12]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14]  Trevor Darrell,et al.  Gaussian Processes for Object Categorization , 2010, International Journal of Computer Vision.

[15]  Horst Bischof,et al.  Motion estimation with non-local total variation regularization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[17]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[18]  Yaser Al-Onaizan,et al.  Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[19]  Shang-Hua Teng,et al.  Power SVM: Generalization with exemplar classification uncertainty , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[22]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  C. V. Jawahar,et al.  Has My Algorithm Succeeded? An Evaluator for Human Pose Estimators , 2012, ECCV.

[24]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[25]  Yichuan Tang,et al.  Deep Learning using Linear Support Vector Machines , 2013, 1306.0239.

[26]  Youmin Zhang,et al.  Development of advanced FDD and FTC techniques with application to an unmanned quadrotor helicopter testbed , 2013, J. Frankl. Inst..

[27]  Rudolph Triebel,et al.  Knowing when we don't know: Introspective classification for mission-critical decision making , 2013, 2013 IEEE International Conference on Robotics and Automation.

[28]  Dong Liu,et al.  Discovering joint audio–visual codewords for video event detection , 2013, Machine Vision and Applications.

[29]  Stefan Carlsson,et al.  Properties of Datasets Predict the Performance of Classifiers , 2013, BMVC.

[30]  Horst Bischof,et al.  Flexible and User-Centric Camera Calibration using Planar Fiducial Markers , 2013, BMVC.

[31]  Pietro Perona,et al.  A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Daniel Cremers,et al.  Semi-dense Visual Odometry for a Monocular Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Rudolph Triebel,et al.  Driven Learning for Driving: How Introspection Improves Semantic Mapping , 2016, ISRR.

[34]  Sinan Kalkan,et al.  Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision? , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[36]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[37]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[38]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[39]  Ali Farhadi,et al.  Towards Transparent Systems: Semantic Characterization of Failure Modes , 2014, ECCV.

[40]  Ali Farhadi,et al.  Predicting Failures of Vision Systems , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[42]  Fingerprint Image Quality , 2015, Encyclopedia of Biometrics.

[43]  Eric Horvitz,et al.  Metareasoning for Planning Under Uncertainty , 2015, IJCAI.

[44]  Martial Hebert,et al.  Vision and Learning for Deliberative Monocular Cluttered Flight , 2014, FSR.

[45]  Martial Hebert,et al.  Semi-Dense Visual Odometry for Monocular Navigation in Clutt ered Environment , 2015 .

[46]  Rudolph Triebel,et al.  Introspective classification for robot perception , 2016, Int. J. Robotics Res..

[47]  Martial Hebert,et al.  Robust Monocular Flight in Cluttered Outdoor Environments , 2016, ArXiv.