Optimal scanning for faster object detection

Recent years have seen the development of fast and accurate algorithms for detecting objects in images. However, as the size of the scene grows, so do the running-times of these algorithms. If a 128×102 pixel image requires 20 ms to process, searching for objects in a 1280×1024 image will take 2 s. This is unsuitable under real-time operating constraints: by the time a frame has been processed, the object may have moved. An analogous problem occurs when controlling robot camera that need to scan scenes in search of target objects. In this paper, we consider a method for improving the run-time of general-purpose object-detection algorithms. Our method is based on a model of visual search in humans, which schedules eye fixations to maximize the long-term information accrued about the location of the target of interest. The approach can be used to drive robot cameras that physically scan scenes or to improve the scanning speed for very large high resolution images. We consider the latter application in this work by simulating a “digital fovea” and sequentially placing it in various regions of an image in a way that maximizes the expected information gain. We evaluate the approach using the OpenCV version of the Viola-Jones face detector. After accounting for all computational overhead introduced by the fixation controller, the approach doubles the speed of the standard Viola-Jones detector at little cost in accuracy.

[1]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[2]  N.J. Butko,et al.  I-POMDP: An infomax model of eye movement , 2008, 2008 7th IEEE International Conference on Development and Learning.

[3]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4]  Wilson S. Geisler,et al.  Optimal eye movement strategies in visual search , 2005, Nature.

[5]  Nando de Freitas,et al.  Target-directed attention: Sequential decision-making for gaze planning , 2008, 2008 IEEE International Conference on Robotics and Automation.

[6]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[7]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Bruno Motta de Carvalho,et al.  Real time vision for robotics using a moving fovea approach with multi resolution , 2008, 2008 IEEE International Conference on Robotics and Automation.

[9]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[10]  Ian R. Fasel,et al.  Towards Practical Facial Feature Detection , 2009, Int. J. Pattern Recognit. Artif. Intell..

[11]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[12]  Matthew H Tong,et al.  Information Attracts Attention: A Probabilistic Account of the Cross-Race Advantage in Visual Search , 2007 .

[13]  Garrison W. Cottrell,et al.  Visual saliency model for robot cameras , 2008, 2008 IEEE International Conference on Robotics and Automation.