Behavior recognition from video based on human constrained descriptor and adaptable neural networks

In this paper we introduce a new descriptor, the Human Constrained Pixel Change History (HC-PCH), which is based on Pixel Change History (PCH) but focuses on the human body movements over time. We propose a modification of the conventional PCH which entails the calculation of two probabilistic maps, based on human face and body detection respectively. The features extracted from this descriptor are used as input to an HMM-based behavior recognition framework. We also introduce a rectification framework of behavior recognition and classification by incorporating an expert user's feedback into the learning process through two proposed schemes: a plain non-linear one and an adaptable one, which requires fewer training samples and is more effective in decreasing misclassification error. The methods presented are validated on a real-world computer vision dataset comprising challenging video sequences from an industrial environment.

[1]  Shaogang Gong,et al.  Learning pixel-wise signal energy for understanding semantics , 2003, Image Vis. Comput..

[2]  Shih-Fu Chang,et al.  Automatic face-region detection in MPEG video sequences , 1996, Other Conferences.

[3]  R. Mukundan,et al.  Moment Functions in Image Analysis: Theory and Applications , 1998 .

[4]  Guijin Wang,et al.  A new framework for on-line object tracking based on SURF , 2011, Pattern Recognit. Lett..

[5]  Luc Van Gool,et al.  Local Features for Image Retrieval , 1999, State-of-the-Art in Content-Based Image and Video Retrieval.

[6]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Shih-Fu Chang,et al.  A highly efficient system for automatic face region detection in MPEG video , 1997, IEEE Trans. Circuits Syst. Video Technol..

[8]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[9]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[10]  Özgür Ulusoy,et al.  Bilvideo-7: an MPEG-7- compatible video indexing and retrieval system , 2010, IEEE MultiMedia.

[11]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Nikolaos D. Doulamis,et al.  Evaluation of relevance feedback schemes in content-based in retrieval systems , 2006, Signal Process. Image Commun..

[13]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Anastasios D. Doulamis,et al.  Adaptable Neural Networks for Objects' Tracking Re-initialization , 2009, ICANN.

[15]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[16]  James W. Davis Hierarchical motion history images for recognizing human motion , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[17]  Nassir Navab,et al.  Workflow monitoring based on 3D motion features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[18]  Anastasios Doulamis,et al.  Knowledge Extraction in Stereo Video Sequences Using Adaptive Neural Networks , 2005 .

[19]  Michael S. Lew,et al.  Real-time object tracking with relevance feedback , 2007, CIVR '07.

[20]  Andrej Pázman,et al.  Nonlinear Regression , 2019, Handbook of Regression Analysis With Applications in R.

[21]  Wei-bang Chen,et al.  A Multiple Instance Learning and Relevance Feedback Framework for Retrieving Abnormal Incidents in Surveillance Videos , 2010, J. Multim..

[22]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[24]  G. J. Smith,et al.  Behind the Screens: Examining Constructions of Deviance and Informal Practices among CCTV Control Room Operators in the UK , 2002 .

[25]  Qi Tian,et al.  Action Recognition Using Spatial-Temporal Context , 2010, 2010 20th International Conference on Pattern Recognition.

[26]  Wonjun Kim,et al.  Background Subtraction for Dynamic Texture Scenes Using Fuzzy Color Histograms , 2012, IEEE Signal Processing Letters.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Yang Wang,et al.  Human Action Recognition by Semilatent Topic Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Theodora A. Varvarigou,et al.  Enhanced human behavior recognition using HMM and evaluative rectification , 2010, ARTEMIS '10.

[30]  Shaogang Gong,et al.  Autonomous Visual Events Detection and Classification without Explicit Object-Centred Segmentation and Tracking , 2002, BMVC.

[31]  José Carlos Príncipe,et al.  Special issue on echo state networks and liquid state machines , 2007, Neural Networks.

[32]  Theodora A. Varvarigou,et al.  A Threefold Dataset for Activity and Workflow Recognition in Complex Industrial Environments , 2012, IEEE MultiMedia.

[33]  Hans-Peter Kriegel,et al.  State-of-the-Art in Content-Based Image and Video Retrieval , 2001, Computational Imaging and Vision.