Model-based active learning to detect isometric deformable objects in the wild with deep architectures

In the recent past, algorithms based on Convolutional Neural Networks (CNNs) have achieved significant milestones in object recognition. With large examples of each object class, standard datasets train well for inter-class variability. However, gathering sufficient data to train for a particular instance of an object within a class is impractical. Furthermore, quantitatively assessing the imaging conditions for each image in a given dataset is not feasible. By generating sufficient images with known imaging conditions, we study to what extent CNNs can cope with hard imaging conditions for instance-level recognition in an active learning regime. Leveraging powerful rendering techniques to achieve instance-level detection, we present results of training three state-of-the-art object detection algorithms namely, Fast R-CNN, Faster R-CNN and YOLO9000, for hard imaging conditions imposed into the scene by rendering. Our extensive experiments produce a mean Average Precision score of 0.92 on synthetic images and 0.83 on real images using the best performing Faster R-CNN. We show for the first time how well detection algorithms based on deep architectures fare for each hard imaging condition studied.

[1]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[2]  Daniel Pizarro-Perez,et al.  Feature-Based Deformable Surface Detection with Self-Occlusion Reasoning , 2011, International Journal of Computer Vision.

[3]  Jitendra Malik,et al.  Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.

[4]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Pascal Fua,et al.  Simultaneous point matching and 3D deformable surface reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Santiago Manen,et al.  Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[9]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[10]  Trevor Darrell,et al.  Active Learning with Gaussian Processes for Object Categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.