Towards Automatic Discovery of Human Movemes

Consider a number of moving points, each attached to a joint of the human body and projected onto an image. Johannson showed that humans can effortlessly detect and recognize the presence of other humans from such displays. This is true even when some of the body parts are missing (e.g., because of occlusion) and unrelated clutter points are added to the display. Furthermore, subtle aspects like age range and gender, as well as the ongoing activity, can be inferred with a surprising degree of accuracy from such a seemingly scarce amount of information. We are interested in replicating some of these abilities in a machine. We start by introducing a labeling and detection scheme in a Johannson-like display. Our method is based on a probabilistic representation of the positions and motion of body parts, which we use to calculate a likely interpretation of the scene by means of belief propagation techniques. We show how learning and inference can be done efficiently, and we provide an experimental validation of the method. In the second part of our work, we present our position on the analysis of human behaviors. We hypothesize a hierarchical description of motion, which provides a natural interpretation of actions and activities as stochastic sequences of "atomic motions" or movemes. We take an initial step in that direction by illustrating how to learn a dictionary of movemes from the trajectories of body parts, which can be used to concisely represent the video for further analysis.

[1]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[2]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[3]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[4]  J. Cutting,et al.  Recognizing friends by their walk: Gait perception without familiarity cues , 1977 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Yaakov Bar-Shalom,et al.  Tracking methods in a multitarget environment , 1978 .

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[9]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[10]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[12]  R. Shumway,et al.  Dynamic linear models with switching , 1991 .

[13]  C. Tomasi Detection and Tracking of Point Features , 1991 .

[14]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Yaakov Bar-Shalom,et al.  Estimation and Tracking: Principles, Techniques, and Software , 1993 .

[16]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[17]  Edward H. Adelson,et al.  Analyzing and recognizing walking figures in XYT , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Chang‐Jin Kim,et al.  Dynamic linear models with Markov-switching , 1994 .

[19]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[20]  G. Mather,et al.  Gender discrimination in biological motion displays based on dynamic cues , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[21]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[22]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[23]  S. Lea,et al.  Perception of Emotion from Dynamic Point-Light Displays Represented in Dance , 1996, Perception.

[24]  Michael Isard,et al.  The CONDENSATION Algorithm - Conditional Density Propagation and Applications to Visual Tracking , 1996, NIPS.

[25]  Geoffrey E. Hinton,et al.  Switching State-Space Models , 1996 .

[26]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[27]  Yangsheng Xu,et al.  Human action learning via hidden Markov model , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[28]  A F Bobick,et al.  Movement, activity and action: the role of knowledge in the perception of motion. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[29]  Karl Rohr,et al.  Human Movement Analysis Based on Explicit Motion Models , 1997 .

[30]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Aaron F. Bobick,et al.  A State-Based Approach to the Representation and Recognition of Gesture , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Xavier Boyen,et al.  Approximate Learning of Dynamic Models , 1998, NIPS.

[34]  Kevin Murphy,et al.  Switching Kalman Filters , 1998 .

[35]  Michael I. Jordan Graphical Models , 1998 .

[36]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[37]  Padhraic Smyth,et al.  Discovering Chinese Words from Unsegmented Text , 1999, SIGIR 1999.

[38]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[39]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[40]  Robert J. McEliece,et al.  The generalized distributive law , 2000, IEEE Trans. Inf. Theory.

[41]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[42]  Liang Zhao,et al.  Stereo- and neural network-based pedestrian detection , 2000, IEEE Trans. Intell. Transp. Syst..

[43]  Xavier Binefa,et al.  Robust Real-Time Periodic Motion Detection, Analysis, and Applications , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[45]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[46]  Dariu Gavrila,et al.  Pedestrian Detection from a Moving Vehicle , 2000, ECCV.

[47]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[48]  Yang Song,et al.  Monocuolar Perception of Biological Motion - Clutter and Partial Occlusion , 2000, ECCV.

[49]  Vladimir Pavlovic,et al.  Impact of dynamic model learning on classification of human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[50]  Michael Isard,et al.  Learning and Classification of Complex Dynamics , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Yang Song,et al.  Learning probabilistic structure for human motion detection , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[52]  David A. Forsyth,et al.  Human Tracking with Mixtures of Trees , 2001, ICCV.

[53]  William T. Freeman,et al.  On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[54]  Jeffrey M. Zacks,et al.  Event structure in perception and conception. , 2001, Psychological bulletin.

[55]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[56]  Yang Song,et al.  Unsupervised Learning of Human Motion Models , 2001, NIPS.

[57]  Stefano Soatto,et al.  Recognition of human gaits , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[58]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[59]  Pedro Larrañaga,et al.  An Introduction to Probabilistic Graphical Models , 2002, Estimation of Distribution Algorithms.

[60]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[61]  Michael I. Jordan,et al.  Learning Graphical Models with Mercer Kernels , 2002, NIPS.

[62]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[63]  P. Perona,et al.  Primitives for Human Motion: a Dynamical Approach , 2002 .

[64]  Pietro Perona,et al.  Human action recognition by sequence of movelet codewords , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[65]  Mark A. Paskin Sample Propagation , 2003, NIPS.

[66]  Pietro Perona,et al.  Decomposition of human motion into dynamics-based primitives with application to drawing tasks , 2003, Autom..

[67]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[68]  P. Perona,et al.  Classification of human actions into dynamics based primitives with application to drawing tasks , 2003, 2003 European Control Conference (ECC).

[69]  J. L. Roux An Introduction to the Kalman Filter , 2003 .

[70]  R.M. Murray,et al.  Segmentation of human motion into dynamics based primitives with application to drawing tasks , 2003, Proceedings of the 2003 American Control Conference, 2003..

[71]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Ankur Agarwal,et al.  3D human pose from silhouettes by relevance vector regression , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[73]  Paolo Giudici,et al.  Improving Markov Chain Monte Carlo Model Search for Data Mining , 2004, Machine Learning.

[74]  Masamichi Shimosaka,et al.  Hierarchical recognition of daily human actions based on Continuous Hidden Markov Models , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[75]  Alexei A. Efros,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[76]  H. Harlyn Baker,et al.  Building surfaces of evolution: The Weaving Wall , 1989, International Journal of Computer Vision.

[77]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[78]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[79]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[80]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[81]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[82]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[83]  James M. Rehg,et al.  Learning and inference in parametric switching linear dynamic systems , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[84]  David A. Forsyth,et al.  Computational Studies of Human Motion: Part 1, Tracking and Motion Synthesis , 2005, Found. Trends Comput. Graph. Vis..

[85]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[86]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[87]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[88]  Zhuowen Tu,et al.  Feature Mining for Image Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[89]  David A. Forsyth,et al.  Searching Video for Complex Activities with Finite State Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Stefano Soatto,et al.  Classification and Recognition of Dynamical Models: The Role of Phase, Independent Components, Kernels and Optimal Transport , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[91]  George Loizou,et al.  Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[92]  Sunita Sarawagi Learning with Graphical Models , 2008 .

[93]  JU SHANONX. RECOGNIZING HUMAN MOTION USING PARAMETERIZED MODELS OF OPTICAL FLOW , 2022 .