Semantic video event search for surveillance video

We present a distributed framework of understanding, indexing, and searching complex events from large amounts of surveillance video content. Video events and relationships between scene entities are represented by Spatio-Temporal And-Or Graphs (ST-AOG) and inferred in a distributed computing system using a bottom-up top-down strategy. We propose a method for sub-graph indexing of ST-AOGs of the recognized events for robust retrieval and quick search. Plain text reports of the scene are automatically generated to describe scene entities' relationships, contextual information, as well as events of interest. When a query is provided as keywords, plain text, voice, or a video clip, the query is parsed and the closest events are extracted utilizing text description and sub-graph matching.

[1]  Marcel Worring,et al.  Multimedia event-based video indexing using time intervals , 2005, IEEE Transactions on Multimedia.

[2]  Kevin Knight,et al.  Unification: a multidisciplinary survey , 1989, CSUR.

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Ramakant Nevatia,et al.  An Ontology for Video Event Representation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[7]  Liang Lin,et al.  I2T: Image Parsing to Text Description , 2010, Proceedings of the IEEE.

[8]  EarleyJay An efficient context-free parsing algorithm , 1970 .

[9]  Li Yu,et al.  Rapidly Deployable Video Analysis Sensor units for wide area surveillance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[10]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[11]  Mun Wai Lee,et al.  SAVE: A framework for semantic annotation of visual events , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Adrian Barbu,et al.  Graph partition by Swendsen-Wang cuts , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Song-Chun Zhu,et al.  Evaluating information contributions of bottom-up and top-down processes , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[16]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[17]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[19]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[20]  Mun Wai Lee,et al.  Traffic analysis with low frame rate camera networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[21]  Song-Chun Zhu,et al.  A Numerical Study of the Bottom-Up and Top-Down Inference Processes in And-Or Graphs , 2011, International Journal of Computer Vision.