A reinforcement learning approach to active camera foveation

In this paper we report on techniques for automatically learning foveal sensing strategies for an active pan-tilt-zoom camera. The approach uses reinforcement learning to discover foveal actions maximizing the performance of visual detectors, that are in turn assumed to be highly correlated with the task at hand. In our case,the main goal is to recognize people, hence a frontal face detection module is employed. The system uses reinforcement learning to learn if when and how to foveate on a subject, basedonits previous experience in terms or successful actions in similar situations. An action is successful if it leads to a correct face detection in the high resolution images obtained when the subject is zoomed in. In contrast with existing methods,the proposed approach obviates the need for camera calibration and camera performance modeling. Also, the method does not rely on active tracking of targets. Experimental results show how the system can be deployed in unconstrained surveillance environments, and is capable of learning foveation strategies without requiring extensive a priori information or environmental models. Results also illustrate how the system effectively learns a strategy that allows the camera to foveate only in situations where successful detection is highly likely.

[1]  Stan Sclaroff,et al.  Look there! Predicting where to look for motion in an active camera network , 2005, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005..

[2]  C. Diehl,et al.  Scheduling an active camera to observe people , 2004, VSSN '04.

[3]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Alberto Del Bimbo,et al.  Improving evidential quality of surveillance imagery through active face tracking , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  Larry S. Davis,et al.  Scalable image-based multi-camera visual surveillance system , 2003, Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003..

[7]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[8]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[9]  T. Kanade,et al.  A master-slave system to acquire biometric imagery of humans at distance , 2003, IWVS '03.

[10]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[11]  Ian D. Reid,et al.  Driving saccade to pursuit using image motion , 1995, International Journal of Computer Vision.

[12]  Max Lu,et al.  Acquiring Multi-Scale Images by Pan-Tilt-Zoom Control and Automatic Multi-Camera Calibration , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[13]  Andrew D. Bagdanov,et al.  Acquisition of high-resolution images through on-line saccade sequence planning , 2005 .