A framework of joint object tracking and event detection

This paper describes a probabilistic framework for simultaneously performing object tracking and event detection in monocular videos. Mathematically, we cast the problem of jointly tracking and detecting semantic events as a principled model-based search problem in a multi-dimensional state space, where the tracking trajectory and event type are discovered via maximum a posteriori (MAP) optimization. The benefit of this approach comes from its combined utilization of particle probabilistic representation, multiple hypothesis retention, efficient particle propagation, and temporal optimization. We present qualitative and quantitative results from realistic video sequences to demonstrate the effectiveness of this approach.

[1]  Michael Isard,et al.  Active Contours , 2000, Springer London.

[2]  James M. Rehg,et al.  Singularity analysis for articulated object tracking , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[3]  Stanley T. Birchfield,et al.  Elliptical head tracking using intensity gradients and color histograms , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[4]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[5]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Edward H. Adelson,et al.  Analyzing and recognizing walking figures in XYT , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.

[8]  Y. Bar-Shalom Tracking and data association , 1988 .

[9]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[11]  Andrew Blake,et al.  Probabilistic tracking in a metric space , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[12]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[13]  Shaogang Gong,et al.  Object Tracking Using Adaptive Color Mixture Models , 1998, ACCV.

[14]  Michael J. Black,et al.  Cardboard people: A parametrized model of articulated motion , 1996 .

[15]  A. Doucet,et al.  Maximum a Posteriori Sequence Estimation Using Monte Carlo Particle Filters , 2001, Annals of the Institute of Statistical Mathematics.

[16]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[17]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[18]  James L. Crowley,et al.  Probabilistic recognition of activity using local appearance , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[19]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Klaus J. Kirchberg,et al.  Robust Face Detection Using the Hausdorff Distance , 2001, AVBPA.

[22]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Alex Pentland,et al.  Framing through peripheral perception , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[24]  Milind R. Naphade,et al.  Audio-visual query and retrieval: A system that uses dynamic programming and relevance feedback , 2001, J. Electronic Imaging.

[25]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[26]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[27]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28]  Rama Chellappa,et al.  Simultaneous tracking and verification via sequential posterior estimation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[29]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[30]  Gregory D. Hager,et al.  Real-time tracking of image regions with changes in geometry and illumination , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  L. Davis,et al.  el-based tracking of humans in action: , 1996 .

[32]  Jack K. Wolf,et al.  Finding the best set of K paths through a trellis with application to multitarget tracking , 1989 .

[33]  Randal C. Nelson,et al.  Detection and Recognition of Periodic, Nonrigid Motion , 1997, International Journal of Computer Vision.

[34]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[35]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[36]  Ramakant Nevatia,et al.  Event Detection and Analysis from Video Streams , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  R. D'Andrade U-statistic hierarchical clustering , 1978 .

[38]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[39]  Dariu Gavrila,et al.  Pedestrian Detection from a Moving Vehicle , 2000, ECCV.