Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes

We address the problem of localizing and obtaining high-resolution footage of the people present in a scene. We propose a biologically-inspired solution combining pre-attentive, low-resolution sensing for detection with shiftable, high-resolution, attentive sensing for confirmation and further analysis.The detection problem is made difficult by the unconstrained nature of realistic environments and human behaviour, and the low resolution of pre-attentive sensing. Analysis of human peripheral vision suggests a solution based on integration of relatively simple but complementary cues. We develop a Bayesian approach involving layered probabilistic modeling and spatial integration using a flexible norm that maximizes the statistical power of both dense and sparse cues. We compare the statistical power of several cues and demonstrate the advantage of cue integration. We evaluate the Bayesian cue integration method for human detection on a labelled surveillance database and find that it outperforms several competing methods based on conjunctive combinations of classifiers (e.g., Adaboost). We have developed a real-time version of our pre-attentive human activity sensor that generates saccadic targets for an attentive foveated vision system. Output from high-resolution attentive detection algorithms and gaze state parameters are fed back as statistical priors and combined with pre-attentive cues to determine saccadic behaviour. The result is a closed-loop system that fixates faces over a 130 deg field of view, allowing high-resolution capture of facial video over a large dynamic scene.

[1]  Vinod Nair,et al.  An unsupervised, online learning framework for moving object detection , 2004, CVPR 2004.

[2]  Heinrich Niemann,et al.  Statistical modeling and performance characterization of a real-time dual camera surveillance system , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[3]  A. Izenman Recent Developments in Nonparametric Density Estimation , 1991 .

[4]  Y. Freund,et al.  Active learning for visual object detection , 2005 .

[5]  Fadi Dornaika,et al.  Attentive Wide-Field Sensing for Visual Telepresence and Surveillance , 2004 .

[6]  Jyrki Rovamo,et al.  Detection of chromatic deviations from white across the human visual field , 1991, Vision Research.

[7]  Michael J. Black,et al.  Learning the statistics of peopl learning the statistics of people in images and video , 2003 .

[8]  Michael Isard,et al.  Bayesian Object Localisation in Images , 2001, International Journal of Computer Vision.

[9]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[10]  Steven C. Dakin,et al.  Absence of contour linking in peripheral vision , 1997, Nature.

[11]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  A. Johnston,et al.  Lower thresholds of motion for gratings as a function of eccentricity and contrast , 1985, Vision Research.

[13]  James H. Elder,et al.  Contour Grouping with Prior Models , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Eric Horvitz,et al.  Bayesian Modality Fusion: Probabilistic Integration of Multiple Vision Algorithms for Head Tracking , 1999 .

[15]  A. Izenman Review Papers: Recent Developments in Nonparametric Density Estimation , 1991 .

[16]  Brian Scassellati,et al.  Eye Finding via Face Detection for a Foveated Active Vision System , 1998, AAAI/IAAI.

[17]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[18]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[19]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[20]  Bernt Schiele,et al.  Towards Robust Multi-cue Integration for Visual Tracking , 2001, ICVS.

[21]  Randolph Blake,et al.  Eccentric perception of biological motion is unscalably poor , 2005, Vision Research.

[22]  Jochen Triesch,et al.  Democratic Integration: Self-Organized Integration of Adaptive Cues , 2001, Neural Computation.

[23]  Ramakant Nevatia,et al.  Tracking multiple humans in complex situations , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Shaogang Gong,et al.  Visual Surveillance in a Dynamic and Uncertain World , 1995, Artif. Intell..

[25]  L. Itti Quantitative modelling of perceptual salience at human eye position , 2006 .

[26]  Henry Schneiderman,et al.  Feature-centric evaluation for efficient cascaded object detection , 2004, CVPR 2004.

[27]  Biswajit Bose,et al.  Improving object classification in far-field video , 2004, CVPR 2004.

[28]  Carlo S. Regazzoni,et al.  Dual camera system for face detection in unconstrained environments , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[29]  Jan-Olof Eklundh,et al.  Probabilistic and Voting Approaches to Cue Integration for Figure-Ground Segmentation , 2002, ECCV.

[30]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[31]  Takeo Kanade,et al.  Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.

[32]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  B. Schiele,et al.  Fast and Robust Face Finding via Local Context , 2003 .

[34]  Bernt Schiele,et al.  Towards robust multi-cue integration for visual tracking , 2001, Machine Vision and Applications.

[35]  Michael J. Black,et al.  Learning the Statistics of People in Images and Video , 2003, International Journal of Computer Vision.

[36]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[37]  Jochen Triesch,et al.  A System for Person-Independent Hand Posture Recognition against Complex Backgrounds , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Shaogang Gong,et al.  Continuous global evidence-based Bayesian modality fusion for simultaneous tracking of multiple objects , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[39]  Stuart J. Russell,et al.  Image Segmentation in Video Sequences: A Probabilistic Approach , 1997, UAI.

[40]  Joseph A. O'Sullivan,et al.  Automatic target recognition organized via jump-diffusion algorithms , 1997, IEEE Trans. Image Process..

[41]  W. F. Bensted-Smith System or Person? , 1947 .

[42]  Christopher O. Jaynes,et al.  Mugshot database acquisition in video surveillance networks using incremental auto-clustering quality measures , 2003, Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003..

[43]  L. Itti Author address: , 1999 .

[44]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[45]  John K. Tsotsos,et al.  Neurobiology of Attention , 2005 .

[46]  Ingemar J. Cox,et al.  Modeling a Dynamic Environment Using a Bayesian Multiple Hypothesis Approach , 1994, Artif. Intell..