Robust Visual Behavior Recognition

In this article, we propose a novel framework for robust visual behavior understanding, capable of achieving high recognition rates in demanding real-life environments and in almost real time. Our approach is based on the utilization of holistic visual behavior understanding methods, which perform modeling directly at the pixel level. This way, we eliminate the world representation layer that can be a significant source of errors for the modeling algorithms. Our proposed system is based on the utilization of information from multiple cameras, aiming to alleviate the effects of occlusions and other similar artifacts, which are rather common in real-life installations. To effectively exploit the acquired information for the purpose of real-time activity recognition, appropriate methodologies for modeling of sequential data stemming from multiple sources are examined. Moreover, we explore the efficacy of the additional application of semisupervised learning methodologies, in an effort to reduce the cost of model training in a completely supervised fashion. The performance of the examined approaches is thoroughly evaluated under real-life visual behavior understanding scenarios, and the obtained results are compared and discussed.

[1]  Haihong Hu,et al.  Factorial HMM and Parallel HMM for Gait Recognition , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[2]  Mari Ostendorf,et al.  HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[3]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[4]  Gautam Biswas,et al.  Temporal Pattern Generation Using Hidden Markov Model Based Unsupervised Classification , 1999, IDA.

[5]  Xinghua Sun,et al.  Action recognition via local descriptors and holistic features , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Eli Shechtman,et al.  Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them? , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[8]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  Sotirios Chatzis,et al.  Robust Sequential Data Modeling Using an Outlier Tolerant Hidden Markov Model , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  R. Mukundan,et al.  Moment Functions in Image Analysis: Theory and Applications , 1998 .

[11]  Kevin P. Murphy,et al.  A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Lawrence Carin,et al.  Semisupervised Learning of Hidden Markov Models via a Homotopy Method , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Zhihong Zeng,et al.  Audio–Visual Affective Expression Recognition Through Multistream Fused HMM , 2008, IEEE Transactions on Multimedia.

[16]  Dana H. Ballard,et al.  Computer Vision , 1982 .

[17]  Sergios Theodoridis,et al.  Hierarchical Feature Fusion for Visual Tracking , 2007, 2007 IEEE International Conference on Image Processing.

[18]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[19]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[20]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[21]  Juergen Luettin,et al.  Asynchronous stream modeling for large vocabulary audio-visual speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.