Attribute Multiset Grammars for Global Explanations of Activities

Recognizing multiple interleaved activities in a video requires implicitly partitioning the detections for each activity. Furthermore, constraints between activities are important in finding valid explanations for all detections. We use Attribute Multiset Grammars (AMGs) as a formal representation for a domain’s knowledge to encode intra- and inter-activity constraints. We show how AMGs can be used to parse all the observations into ‘feasible’ global explanations. We also present an algorithm for building a Bayesian network (BN) given an AMG and a set of detections. The set of labellings of the BN corresponds to the set of all possible parse trees. Finding the best explanation then amounts to finding the maximum a posteriori labeling of the BN. The technique is successfully applied to two different problems including the challenging problem of associating pedestrians and carried objects entering and departing a building.

[1]  Eric J. Golin A method for the specification and parsing of visual languages , 1991 .

[2]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.

[3]  Ramakant Nevatia,et al.  Hierarchical Language-based Representation of Events in Video Streams , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[4]  Dima Damen,et al.  Detecting Carried Objects in Short Video Sequences , 2008, ECCV.

[5]  Larry S. Davis,et al.  Hierarchical Constraint Processes for Shape Analysis , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  James P. Blevins,et al.  Feature‐Based Grammar , 2011 .

[7]  A. G. Amitha Perera,et al.  Joint Recognition of Complex Events and Track Matching , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Dima Damen,et al.  Associating People Dropping off and Picking up Objects , 2007, BMVC.

[9]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, CVPR 2009.

[11]  Kim Marriott Constraint multiset grammars , 1994, Proceedings of 1994 IEEE Symposium on Visual Languages.

[12]  R. Bowden,et al.  Towards automated wide area visual surveillance: tracking objects between spatially-separated, uncalibrated views , 2005 .

[13]  Feng Han,et al.  Bottom-up/top-down image parsing by attribute graph grammar , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Donald E. Knuth,et al.  Semantics of context-free languages , 1968, Mathematical systems theory.

[15]  Li Li,et al.  Semantic event representation and recognition using syntactic attribute graph grammar , 2009, Pattern Recognit. Lett..

[16]  Rama Chellappa,et al.  Attribute Grammar-Based Event Recognition and Anomaly Detection , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[17]  Irfan A. Essa,et al.  Recognizing multitasked activities from video using stochastic context-free grammar , 2002, AAAI/IAAI.

[18]  Larry S. Davis,et al.  Event Modeling and Recognition Using Markov Logic Networks , 2008, ECCV.

[19]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.