Learning and understanding dynamic scene activity: a review

We are entering an era of more intelligent cognitive vision systems. Such systems can analyse activity in dynamic scenes to compute conceptual descriptions from motion trajectories of moving people and the objects they interact with. Here we review progress in the development of flexible, generative models that can explain visual input as a combination of hidden variables and can adapt to new types of input. Such models are particularly appropriate for the tasks posed by cognitive vision as they incorporate learning as well as having sufficient structure to represent a general class of problems. In addition, generative models explain all aspects of the input rather than attempting to ignore irrelevant sources of variation as in exemplar-based learning. Applications of these models in visual interaction for education, smart rooms and cars, as well as surveillance systems is also briefly reviewed.

[1]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[3]  Dimitris N. Metaxas,et al.  Shape and Nonrigid Motion Estimation Through Physics-Based Synthesis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Andrew Blake,et al.  A Probabilistic Exclusion Principle for Tracking Multiple Objects , 2000, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Ramakant Nevatia,et al.  Multi-agent event recognition , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[7]  Shaogang Gong,et al.  Visual observation as reactive learning , 1992, Defense, Security, and Sensing.

[8]  Hilary Buxton,et al.  Selective Attention in Dynamic Vision , 1993, IJCAI.

[9]  Alex Pentland,et al.  Dynamic models of human motion , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[10]  Demetri Terzopoulos Visual modeling for computer animation: Graphics with a vision , 1999, COMG.

[11]  David C. Hogg Model-based vision: a program to see a walking person , 1983, Image Vis. Comput..

[12]  F. Perera Uncovering new clues to cancer risk. , 1996, Scientific American.

[13]  Eugene Charniak,et al.  Bayesian Networks without Tears , 1991, AI Mag..

[14]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[15]  Z. Pylyshyn The role of location indexes in spatial perception: A sketch of the FINST spatial-index model , 1989, Cognition.

[16]  Michael F. Land,et al.  From eye movements to actions: how batsmen hit the ball , 2000, Nature Neuroscience.

[17]  Alex Pentland,et al.  Recovery of non-rigid motion and structure , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[20]  Matthew Turk,et al.  Visual interaction with lifelike characters , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[21]  Jan-Olof Eklundh,et al.  Issues in Active Vision: Attention and Cue Integration/selection , 1996 .

[22]  Timothy F. Cootes,et al.  Learning to identify and track faces in image sequences , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[23]  Z. Pylyshyn Visual indexes, preconceptual objects, and situated vision , 2001, Cognition.

[24]  Alex Pentland,et al.  Action Reaction Learning: Automatic Visual Analysis and Synthesis of Interactive Behaviour , 1999, ICVS.

[25]  Ann E. Nicholson,et al.  The Data Association Problem when Monitoring Robot Vehicles Using Dynamic Belief Networks , 1992, ECAI.

[26]  Jitendra Malik,et al.  Smart Cars and Smart Roads , 1995, BMVC.

[27]  Shaogang Gong,et al.  Bayesian Nets for Mapping Contextual Knowledge to Computational Constraints in Motion Segmentation a , 1993 .

[28]  Aaron F. Bobick,et al.  A state-based technique for the summarization and recognition of gesture , 1995, Proceedings of IEEE International Conference on Computer Vision.

[29]  Chahab Nastar,et al.  Vibration Modes for Nonrigid Motion Analysis in 3D Images , 1994, ECCV.

[30]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.

[31]  Alex Pentland,et al.  Modeling and Prediction of Human Behavior , 1999, Neural Computation.

[32]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[33]  Shaogang Gong,et al.  Visual Surveillance in a Dynamic and Uncertain World , 1995, Artif. Intell..

[34]  Stuart J. Russell,et al.  The BATmobile: Towards a Bayesian Automated Taxi , 1995, IJCAI.

[35]  Hilary Buxton,et al.  Conceptual descriptions from monitoring and watching image sequences , 2000, Image Vis. Comput..

[36]  Richard J. Howarth,et al.  Interpreting a Dynamic and Uncertain World: Task-Based Control , 1998, Artif. Intell..

[37]  S. Ullman Visual routines , 1984, Cognition.

[38]  Claudio S. Pinhanez,et al.  Intelligent Studios Modeling Space and Action to Control TV Cameras , 1997, Appl. Artif. Intell..

[39]  R. J. Howarth,et al.  Attentional control for visual surveillance , 1998, Proceedings 1998 IEEE Workshop on Visual Surveillance.

[40]  Demetri Terzopoulos,et al.  Animat vision: Active vision in artificial animals , 1995, Proceedings of IEEE International Conference on Computer Vision.

[41]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[42]  Ramakant Nevatia,et al.  Bayesian framework for video surveillance application , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[43]  Geoffrey D. Sullivan,et al.  Visual Object Recognition Using Deformable Models of Vehicles , 1995 .

[44]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[45]  Patrick Oliver,et al.  Representation and Processing of Spatial Expressions , 1998 .

[46]  Christopher M. Brown,et al.  Where to Look Next Using a Bayes Net: Incorporating Geometric Relations , 1992, ECCV.

[47]  James W. Davis,et al.  The KidsRoom: A Perceptually-Based Interactive and Immersive Story Environment , 1999, Presence.

[48]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[49]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[50]  Timothy F. Cootes,et al.  Training Models of Shape from Sets of Examples , 1992, BMVC.

[51]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[52]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  J. Q. Smith,et al.  1. Bayesian Statistics 4 , 1993 .

[54]  Hilary Buxton,et al.  Watching behaviour: the role of context and learning , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[55]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[56]  Alex Pentland,et al.  Graphical Models for Recognizing Human Interactions , 1998, NIPS.

[57]  Michael Isard,et al.  A mixed-state condensation tracker with automatic model-switching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[58]  David C. Hogg,et al.  Generating Spatiotemporal Models from Examples , 1995, BMVC.

[59]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[60]  Shaogang Gong,et al.  Visual behavior: modeling 'hidden' purposes in motion , 1992, Optics & Photonics.

[61]  Christopher M. Brown,et al.  Control of selective perception using bayes nets and decision theory , 1994, International Journal of Computer Vision.

[62]  David C. Hogg,et al.  Learning the distribution of object trajectories for event recognition , 1996, Image Vis. Comput..

[63]  Hilary Buxton,et al.  Visual Surveillance Monitoring and Watching , 1996, ECCV.

[64]  Tod S. Levitt,et al.  Utility-based control for computer vision , 2013, UAI.

[65]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[66]  Christopher J. Taylor,et al.  Model-Based Interpretation of 3D Medical Images , 1993, BMVC.

[67]  Dana H. Ballard,et al.  Animate Vision , 1991, Artif. Intell..

[68]  Tieniu Tan,et al.  An Integrated Traffic and Pedestrian Model-Based Vision System , 1997, BMVC.

[69]  David C. Hogg,et al.  Learning Variable-Length Markov Models of Behavior , 2001, Comput. Vis. Image Underst..

[70]  David C. Hogg,et al.  The acquisition and use of interaction behaviour models , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[71]  Adam Baumberg Hierarchical shape fitting using an iterated linear filter , 1998, Image Vis. Comput..

[72]  Thomas L. Dean,et al.  Probabilistic Temporal Reasoning , 1988, AAAI.

[73]  David C. Hogg,et al.  An Adaptive Eigenshape Model , 1995, BMVC.

[74]  Alex Pentland,et al.  Graphical models for driver behavior recognition in a SmartCar , 2000, Proceedings of the IEEE Intelligent Vehicles Symposium 2000 (Cat. No.00TH8511).

[75]  Tod S. Levitt,et al.  Model-Based Influence Diagrams for Machine Vision , 1989, UAI.

[76]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[77]  Alex Pentland,et al.  On Reversing Jensen's Inequality , 2000, NIPS.

[78]  Michael Isard,et al.  Learning to Track the Visual Motion of Contours , 1995, Artif. Intell..

[79]  David C. Hogg,et al.  Learning Flexible Models from Image Sequences , 1994, ECCV.

[80]  Jitendra Malik,et al.  A real-time computer vision system for measuring traffic parameters , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[81]  Demetri Terzopoulos,et al.  Physically-based facial modelling, analysis, and animation , 1990, Comput. Animat. Virtual Worlds.

[82]  David J. Fleet,et al.  Probabilistic Detection and Tracking of Motion Boundaries , 2000, International Journal of Computer Vision.

[83]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[84]  Michael Isard,et al.  Active Contours , 2000, Springer London.

[85]  Alex Pentland,et al.  Recovery of Nonrigid Motion and Structure , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[86]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interaction , 1999, ICVS.