Detecting people in cluttered indoor scenes

Motion is an important visual cue for scene analysis. It is particularly useful when the scene is cluttered, such as in typical home or office environments. We present a motion segmentation algorithm that makes use of temporal differencing to detect moving people in cluttered indoor scenes. The algorithm is devised based on a couple of perceptual organization principles. To deal with missing data, noise and outliers, a robust segmentation and grouping technique called tensor voting is employed. The resulting real-time people detector can handle the presence of multiple persons, and varying body sizes and poses. It requires no initialization, uses subjective threshold, which defines the minimum saliency of "significant" motion, and the only two parameters are the scales (sizes) of the local neighborhood for region and contour analysis.

[1]  Mi-Suen Lee,et al.  Inferring segmented surface description from stereo data , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[2]  Lance R. Williams,et al.  Stochastic Completion Fields: A Neural Model of Illusory Contour Shape and Salience , 1995, Neural Computation.

[3]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[4]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[5]  Gérard G. Medioni,et al.  Inferring global perceptual contours from local features , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Alexander H. Waibel,et al.  A real-time face tracker , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[7]  Yair Weiss,et al.  Smoothness in layers: Motion segmentation using nonparametric mixture estimation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Michael Isard,et al.  Active Contours , 2000, Springer London.

[9]  Paul L. Rosin Thresholding for change detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[10]  Takeo Kanade,et al.  Advances in Cooperative Multi-Sensor Video Surveillance , 1999 .

[11]  Shimon Ullman,et al.  Structural Saliency: The Detection Of Globally Salient Structures using A Locally Connected Network , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[12]  Gérard G. Medioni,et al.  Accurate motion flow estimation with discontinuities , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[13]  Michael Isard,et al.  ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[14]  G. Medioni,et al.  Grouping . ,-, → ,-, into regions , curves , and junctions , 1999 .

[15]  C. Westin A Tensor Framework for Multidimensional Signal Processing , 1994 .

[16]  James L. Crowley,et al.  Multi-modal tracking of faces for video communications , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Gérard G. Medioni,et al.  Inference of Integrated Surface, Curve, and Junction Descriptions From Sparse 3D Data , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  David Marr,et al.  VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .

[19]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[20]  Pierre Kornprobst,et al.  Tracking segmented objects using tensor voting , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[21]  Mi-Suen Lee,et al.  Epipolar geometry estimation by tensor voting in 8D , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[22]  Mi-Suen Lee,et al.  Grouping ., -, ->, [formula], into Regions, Curves, and Junctions , 1999, Comput. Vis. Image Underst..