Integrated Tracking and Recognition of Human Activities in Shape Space

Activity recognition consists of two fundamental tasks: tracking the features/objects of interest, and recognizing the activities. In this paper, we show that these two tasks can be integrated within the framework of a dynamical feedback system. In our proposed method, the recognized activity is continuously adapted based on the output of the tracking algorithm, which in turn is driven by the identity of the recognized activity. A non-linear, non-stationary stochastic dynamical model on the “shape” of the objects participating in the activities is used to represent their motion, and forms the basis of the tracking algorithm. The tracked observations are used to recognize the activities by comparing against a prior database. Measures designed to evaluate the performance of the tracking algorithm serve as a feedback signal. The method is able to automatically detect changes and switch between activities happening one after another, which is akin to segmenting a long sequence into homogeneous parts. The entire process of tracking, recognition, change detection and model switching happens recursively as new video frames become available. We demonstrate the effectiveness of the method on real-life video and analyze its performance based on such metrics as detection delay and false alarm.

[1]  Rama Chellappa,et al.  "Shape Activity": a continuous-state HMM for moving/deforming shapes with application to abnormal activity detection , 2005, IEEE Transactions on Image Processing.

[2]  A. Roy-Chowdhury,et al.  Pose and Illumination Invariant Registration and Tracking for Video-based Face Recognition , 2006 .

[3]  Rama Chellappa,et al.  Stochastic Approximation and Rate-Distortion Analysis for Robust Structure and Motion Estimation , 2003, International Journal of Computer Vision.

[4]  Rama Chellappa,et al.  3D face reconstruction from video using a generic model , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[5]  R. Chellappa,et al.  Deterministic and Statistical Properties of Multi-Resolution 3 D Modeling , 2003 .

[6]  渡辺 亮平,et al.  Sequential Monte Carlo , 2005, Nonlinear Time Series Analysis.

[7]  Mubarak Shah,et al.  A general framework for temporal video scene segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Amit K. Roy-Chowdhury,et al.  Decentralized camera network control using game theory , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[9]  Amit K. Roy-Chowdhury An algorithm for 3D reconstruction of deformable shape sequences , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Rama Chellappa,et al.  Statistical Error Propagation in 3D Modeling From Monocular Video , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[11]  Amit K. Roy-Chowdhury,et al.  Summarization and Indexing of Human Activity Sequences , 2006, 2006 International Conference on Image Processing.

[12]  Rama Chellappa,et al.  A hidden Markov model based framework for recognition of humans from gait sequences , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[13]  Amit K. Roy-Chowdhury,et al.  Determining Topology in a Distributed Camera Network , 2007, 2007 IEEE International Conference on Image Processing.

[14]  Marc Niethammer,et al.  Dynamic level sets for visual tracking , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[15]  Amit K. Roy-Chowdhury,et al.  A theoretical analysis of linear and multi-linear models of image appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  N. Vaswani,et al.  Change detection in partially observed nonlinear dynamic systems with unknown change parameters , 2004, Proceedings of the 2004 American Control Conference.

[17]  A. Roy-Chowdhury,et al.  Integrating Motion and Illumination Models for 3D Tracking , 2005, Computer Vision for Interactive and Intelligent Environment (CVIIE'05).

[18]  Amit K. Roy-Chowdhury,et al.  Towards a Multi-Terminal Video Compression Algorithm Using Epipolar Geometry , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[19]  Rama Chellappa,et al.  Contour-based 3D Face Modeling from a Monocular Video , 2004, BMVC.

[20]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[21]  Stefano Soatto,et al.  Tracking deformable moving objects under severe occlusions , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[22]  Rama Chellappa,et al.  3 D Face Modeling From Monocular Video Sequences , 2005 .

[23]  Rama Chellappa,et al.  Video based rendering of planar dynamic scenes , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[24]  Amit K. Roy-Chowdhury,et al.  An Illumination Invariant 3D Model Based Tracking Algorithm, with Application in Video Compression , 2006, 2006 International Conference on Image Processing.

[25]  K. Ramakrishnan,et al.  Region-of-interest reconstruction from noisy projections using fractal models and Wiener filtering , 1998 .

[26]  Amit K. Roy-Chowdhury,et al.  Pose and Illumination Invariant Face Recognition in Video , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Rama Chellappa,et al.  Probabilistic recognition of human faces from video , 2002, Proceedings. International Conference on Image Processing.

[28]  R. Chellappa,et al.  Non-Stationary "Shape Activities" , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[29]  Rama Chellappa,et al.  Activity Modeling and Recognition Using Shape Theory , 2003 .

[30]  Neil J. Gordon,et al.  Editors: Sequential Monte Carlo Methods in Practice , 2001 .

[31]  Amit K. Roy-Chowdhury,et al.  Super-Resolved Facial Texture Under Changing Pose and Illumination , 2007, 2007 IEEE International Conference on Image Processing.

[32]  Rama Chellappa,et al.  Towards a view invariant gait recognition algorithm , 2003, Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003..

[33]  Andrew Blake,et al.  Classification of human body motion , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[34]  Amit K. Roy-Chowdhury,et al.  Learning a geometry integrated image appearance manifold from a small training set , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Y. Bar-Shalom Tracking and data association , 1988 .

[36]  Rama Chellappa,et al.  Towards a criterion for evaluating the quality of 3D reconstructions , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  Rama Chellappa,et al.  Visual tracking and recognition using appearance-adaptive models in particle filters , 2004, IEEE Transactions on Image Processing.

[38]  Rama Chellappa,et al.  A system identification approach for video-based face recognition , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[39]  Amit K. Roy-Chowdhury,et al.  Multi-target tracking through opportunistic camera control in a resource constrained multimodal sensor network , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[40]  Li Yu,et al.  Understanding Images of Graphical User Interfaces: A New Approach to Activity Recognition for Visual Surveillance , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[41]  Rama Chellappa,et al.  Facial similarity across age, disguise, illumination and pose , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[42]  Rama Chellappa,et al.  Video synthesis of arbitrary views for approximately planar scenes , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[43]  Namrata Vaswani,et al.  Particle filtering for geometric active contours with application to tracking moving and deforming objects , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[44]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[45]  Rama Chellappa,et al.  A Factorization Approach for Activity Recognition , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[46]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[47]  Rama Chellappa,et al.  Robust estimation of depth and motion using stochastic approximation , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[48]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[49]  Amit K. Roy-Chowdhury,et al.  A Multi-Terminal Model-Based Video Compression Algorithm , 2006, 2006 International Conference on Image Processing.

[50]  R. Chellappa,et al.  NonStationary “ Shape Activities ” , 2005 .

[51]  Rama Chellappa,et al.  Statistical shape theory for activity modeling , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[52]  Rama Chellappa,et al.  A robust algorithm for fusing noisy depth estimates using stochastic approximation , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[53]  Henry A. Kautz,et al.  Location-Based Activity Recognition using Relational Markov Networks , 2005, IJCAI.

[54]  W. Eric L. Grimson,et al.  Using adaptive tracking to classify and monitor activities in a site , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[55]  Yilei Xu,et al.  Pose and Illumination Invariant Face Recognition Using Video Sequences , 2007 .

[56]  Amit K. Roy-Chowdhury,et al.  GAIT-BASED HUMAN IDENTIFICATION FROM A MONOCULAR VIDEO SEQUENCE , 2003 .

[57]  Michael Isard,et al.  Learning and Classification of Complex Dynamics , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  Amit K. Roy-Chowdhury,et al.  Modeling Time-Varying Illumination Patterns in Video , 2007, 2007 IEEE International Conference on Image Processing.

[59]  Michael Harville,et al.  Fast, integrated person tracking and activity recognition with plan-view templates from a single stereo camera , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[60]  R. Chellappa,et al.  Fusing Multiple Two Frame Depth Estimates for 3D Reconstruction with Unknown Noise Distribution , 2000 .

[61]  Amit K. Roy-Chowdhury,et al.  Learning Illumination Models While Tracking , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[62]  Rama Chellappa,et al.  Activity Representation Using 3D Shape Models , 2008, EURASIP J. Image Video Process..

[63]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[64]  Rama Chellappa,et al.  Wide baseline image registration with application to 3-D face modeling , 2004, IEEE Transactions on Multimedia.

[65]  Rama Chellappa,et al.  Wide baseline image registration using prior information , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[66]  Stefan Bilbao,et al.  Proceedings of the International Conference on Acoustics Speech and Signal Processing , 2006 .