A symbolic framework for recognizing activities in full motion surveillance videos

We present a symbolic framework for recognizing activities of interest in real time from video streams automatically. This framework uses regular expressions to symbolically represent (possibly infinite) sets of motion characteristics obtained from a video. It uniformly handles both trajectory-based and periodic articulated activities and provides polynomial time graph algorithms for fast recognition. The regular expressions representing motion characteristics can either be provided manually or learnt automatically from positive and negative examples of strings (that describe dynamic behavior) using offline automata learning frameworks. Confidence measures are associated with recognition using Levenshtein distance between a string representing a motion signature and the regular expression describing an activity. We have used our framework to recognize trajectory-based activities like vehicle turns (U-turns, left and right turns, and K-turns), vehicle start and stop, a person running and walking, and periodic articulated activities like hand waving, boxing, hand clapping and digging in videos from the VIRAT public dataset, the KTH dataset, and a set of videos obtained from YouTube. Our framework is fast (it runs at nearly 3 times real time) and on the KTH dataset, it is shown to outperform three of the latest existing approaches.

[1]  Y. Gurevich On Finite Model Theory , 1990 .

[2]  Silvio Savarese,et al.  Learning context for collective activity recognition , 2011, CVPR 2011.

[3]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[4]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[5]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[6]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[7]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jake K. Aggarwal,et al.  Recognition of Composite Human Activities through Context-Free Grammar Based Representation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Will N. Browne,et al.  Extending XCS with Cyclic Graphs for Scalability on Complex Boolean Problems , 2017, Evolutionary Computation.

[11]  Larry S. Davis,et al.  Event Modeling and Recognition Using Markov Logic Networks , 2008, ECCV.

[12]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Mubarak Shah,et al.  Automated Visual Surveillance in Realistic Scenarios , 2007, IEEE MultiMedia.

[14]  Zhenguo Li,et al.  Modeling Scene and Object Contexts for Human Action Retrieval With Few Examples , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  G. Metta,et al.  A compositional approach for 3D arm-hand action recognition , 2013, 2013 IEEE Workshop on Robot Vision (WORV).

[16]  Benedikt Bollig,et al.  libalf: The Automata Learning Framework , 2010, CAV.

[17]  Anthony G. Cohn,et al.  Unsupervised Learning of Event Classes from Video , 2010, AAAI.

[18]  Qiang Ji,et al.  Knowledge Based Activity Recognition with Dynamic Bayesian Network , 2010, ECCV.

[19]  Gonzalo Navarro,et al.  Approximate Regular Expression Matching , 2008, Encyclopedia of Algorithms.

[20]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[21]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[22]  Lei Chen,et al.  Symbolic representation and retrieval of moving object trajectories , 2004, MIR '04.

[23]  Fred Kröger,et al.  Temporal Logic of Programs , 1987, EATCS Monographs on Theoretical Computer Science.

[24]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[25]  Jilles van Gurp,et al.  ON THE IMPLEMENTATION OF FINITE STATE MACHINES , 1999 .

[26]  Binlong Li,et al.  Activity recognition using dynamic subspace angles , 2011, CVPR 2011.

[27]  Denis Thérien,et al.  Logic Meets Algebra: the Case of Regular Languages , 2007, Log. Methods Comput. Sci..

[28]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[29]  Pinar Duygulu Sahin,et al.  Human action recognition with line and flow histograms , 2008, 2008 19th International Conference on Pattern Recognition.

[30]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  A. G. Amitha Perera,et al.  Video Activity Recognition in the Real World , 2008, AAAI.

[32]  Larry S. Davis,et al.  VidMAP: video monitoring of activity with Prolog , 2005, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005..