Recognition of Visual Activities and Interactions by Stochastic Parsing

This paper describes a probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents. The fundamental idea is to divide the recognition problem into two levels. The lower level detections are performed using standard independent probabilistic event detectors to propose candidate detections of low-level features. The outputs of these detectors provide the input stream for a stochastic context-free grammar parsing mechanism. The grammar and parser provide longer range temporal constraints, disambiguate uncertain low-level detections, and allow the inclusion of a priori knowledge about the structure of temporal events in a given domain. We develop a real-time system and demonstrate the approach in several experiments on gesture recognition and in video surveillance. In the surveillance application, we show how the system correctly interprets activities of multiple interacting objects.

[1]  R. Narasimhan,et al.  Labeling Schemata and Synctactic Descriptions of Pictures , 1964, Inf. Control..

[2]  Alfred V. Aho,et al.  A Minimum Distance Error-Correcting Parser for Context-Free Languages , 1972, SIAM J. Comput..

[3]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[4]  A paradigm for semantic picture recognition , 1973, ACM Annual Conference.

[5]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[6]  Michael L. Baird,et al.  A paradigm for semantic picture recognition , 1973, Pattern Recognit..

[7]  Michael G. Thomason Stochastic Syntax-Directed Translation Schemata for Correction of Errors in Context-Free Languages , 1975, IEEE Transactions on Computers.

[8]  King-Sun Fu,et al.  Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition , 1980, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[10]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[11]  H. Bunke,et al.  PARSING MULTIVALUED STRINGS AND ITS APPLICATION TO IMAGE AND WAVEFORM RECOGNITION , 1990 .

[12]  John D. Lafferty,et al.  Computation of the Probability of Initial Substring Generation by Stochastic Context-Free Grammars , 1991, Comput. Linguistics.

[13]  Robert J. Schalkoff,et al.  Pattern recognition - statistical, structural and neural approaches , 1991 .

[14]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Hermann Ney,et al.  Stochastic Grammars and Pattern Recognition , 1992 .

[16]  Frederick Jelinek,et al.  Basic Methods of Probabilistic Context Free Grammars , 1992 .

[17]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[19]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[20]  Ted Briscoe,et al.  Robust stochastic parsing using the inside-outside algorithm , 1994, ArXiv.

[21]  Ramesh C. Jain,et al.  Recursive identification of gesture inputs using hidden Markov models , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[22]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[23]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[24]  Gheorghe Paun,et al.  Grammar Systems: A Grammatical Approach to Distribution and Cooperation , 1995, ICALP.

[25]  Alberto Sanfeliu,et al.  Automatic Recognition of Bidimensional Models Learned by Grammatical Inference in Outdoor Scenes , 1996, SSPR.

[26]  B. John Oommen,et al.  Optimal and Information Theoretic Syntactic Pattern Recognition for Traditional Errors , 1996, SSPR.

[27]  Alberto Sanfeliu,et al.  Efficient Recognition of a Class of Context-Sensitive Languages Described by Augmented Regular Expressions , 1996, SSPR.

[28]  Lillian Lee,et al.  Learning of Context-Free Languages: A Survey of the Literature , 1996 .

[29]  Matthew Brand,et al.  Understanding manipulation in video , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[30]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[31]  A F Bobick,et al.  Movement, activity and action: the role of knowledge in the perception of motion. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[32]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Justine Cassell,et al.  Temporal classification of natural gesture and application to video coding , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Jake K. Aggarwal,et al.  Human motion analysis: a review , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[35]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Jonathan D. Courtney Automatic video indexing via object motion analysis , 1997, Pattern Recognit..

[37]  Aaron F. Bobick,et al.  A State-Based Approach to the Representation and Recognition of Gesture , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  W. Eric L. Grimson,et al.  Using adaptive tracking to classify and monitor activities in a site , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[39]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..