Vector field analysis for multi-object behavior modeling

This paper proposes an end-to-end system to recognize multi-person behaviors in video, unifying different tasks like segmentation, modeling and recognition within a single optical flow based motion analysis framework. We show how optical flow can be used for analyzing activities of individual actors, as opposed to dense crowds, which is what the existing literature has concentrated on mostly. The algorithm consists of two steps - identification of motion patterns and modeling of motion patterns. Activities are analyzed using the underlying motion patterns which are formed by the optical flow field over a period of time. Streaklines are used to capture these motion patterns via integration of the flow field. To recognize the regions of interest, we utilize the Helmholtz decomposition to compute the divergence potential. The extrema or critical points of this potential indicates regions of high activity in the video, which are then represented as motion patterns by clustering the streaklines. We then present a method to compare two videos by measuring the similarity between their motion patterns using a combination of shape theory and subspace analysis. Such an analysis allows us to represent, compare and recognize a wide range of activities. We perform experiments on state-of-the-art datasets and show that the proposed method is suitable for natural videos in the presence of noise, background clutter and high intra class variations. Our method has two significant advantages over recent related approaches - it provides a single framework that takes care of both low-level and high-level visual analysis tasks, and is computationally efficient.

[1]  Sangho Park,et al.  Recognition of two-person interactions using a hierarchical Bayesian network , 2003, IWVS '03.

[2]  Rama Chellappa,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Matching Shape Sequences in Video with Applications in Human Movement Analysis. Ieee Transactions on Pattern Analysis and Machine Intelligence 2 , 2022 .

[3]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[4]  Mubarak Shah,et al.  Detecting global motion patterns in complex videos , 2008, 2008 19th International Conference on Pattern Recognition.

[5]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  K. Mardia,et al.  Statistical Shape Analysis , 1998 .

[7]  Stefano Soatto,et al.  Recognition of human gaits , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  Gregory D. Hager,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, CVPR.

[9]  Jake K. Aggarwal,et al.  An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010 , 2010, ICPR Contests.

[10]  Takashi Matsuyama,et al.  Multiobject Behavior Recognition by Event Driven Selective Attention Method , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Jake K. Aggarwal,et al.  Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Amit K. Roy-Chowdhury,et al.  A “string of feature graphs” model for recognition of complex activities in natural videos , 2011, 2011 International Conference on Computer Vision.

[15]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Mubarak Shah,et al.  A Streakline Representation of Flow in Crowded Scenes , 2010, ECCV.

[18]  Holger Theisel,et al.  Vector Field Metrics Based on Distance Measures of First Order Critical Points , 2002, WSCG.

[19]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Pratim Ghosh,et al.  A Nonconservative Flow Field for Robust Variational Image Segmentation , 2010, IEEE Transactions on Image Processing.

[22]  Luc Van Gool,et al.  What's going on? Discovering spatio-temporal dependencies in dynamic scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  William Brendel,et al.  Learning spatiotemporal graphs of human activities , 2011, 2011 International Conference on Computer Vision.

[24]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[25]  Ramakant Nevatia,et al.  Human Pose Tracking in Monocular Sequence Using Multilevel Structured Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Gene H. Golub,et al.  Matrix computations , 1983 .

[28]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[29]  Rama Chellappa,et al.  Activity recognition using the dynamics of the configuration of interacting objects , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[30]  Ivan Laptev,et al.  Local Descriptors for Spatio-temporal Recognition , 2004, SCVMA.