Distribution-Based Dimensionality Reduction Applied to Articulated Motion Recognition

Some articulated motion representations rely on frame-wise abstractions of the statistical distribution of low-level features such as orientation, color, or relational distributions. As configuration among parts changes with articulated motion, the distribution changes, tracing a trajectory in the latent space of distributions, which we call the configuration space. These trajectories can then be used for recognition using standard techniques such as dynamic time warping. The core theory in this paper concerns embedding the frame-wise distributions, which can be looked upon as probability functions, into a low-dimensional space so that we can estimate various meaningful probabilistic distances such as the Chernoff, Bhattacharya, Matusita, Kullback-Leibler (KL) or symmetric-KL distances based on dot products between points in this space. Apart from computational advantages, this representation also affords speed-normalized matching of motion signatures. Speed normalized representations can be formed by interpolating the configuration trajectories along their arc lengths, without using any knowledge of the temporal scale variations between the sequences. We experiment with five different probabilistic distance measures and show the usefulness of the representation in three different contexts - sign recognition (with large number of possible classes), gesture recognition (with person variations), and classification of human-human interaction sequences (with segmentation problems). We find the importance of using the right distance measure for each situation. The low-dimensional embedding makes matching two to three times faster, while achieving recognition accuracies that are close to those obtained without using a low-dimensional embedding. We also empirically establish the robustness of the representation with respect to low-level parameters, embedding parameters, and temporal-scale parameters.

[1]  Roberto Cipolla,et al.  Continuous Gesture Recognition using a Sparse Bayesian Classifier , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[2]  Sudeep Sarkar,et al.  Unsupervised Modeling of Signs Embedded in Continuous Sentences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[3]  Rama Chellappa,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Matching Shape Sequences in Video with Applications in Human Movement Analysis. Ieee Transactions on Pattern Analysis and Machine Intelligence 2 , 2022 .

[4]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[6]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[7]  K. Matusita Decision Rules, Based on the Distance, for Problems of Fit, Two Samples, and Estimation , 1955 .

[8]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[9]  Agnès Just,et al.  Two-Handed Gesture Recognition , 2005 .

[10]  Aaron F. Bobick,et al.  A State-Based Approach to the Representation and Recognition of Gesture , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Richard Bowden,et al.  Large Lexicon Detection of Sign Language , 2007, ICCV-HCI.

[12]  Sudeep Sarkar,et al.  Modeling Signs Using Functional Data Analysis , 2004, ICVGIP.

[13]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  Michal Irani,et al.  Aligning Sequences and Actions by Maximizing Space-Time Correlations , 2006, ECCV.

[16]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[17]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[20]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[21]  Tieniu Tan,et al.  Real time hand tracking by combining particle filtering and mean shift , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[22]  Sudeep Sarkar,et al.  Statistical Motion Model Based on the Change of Feature Relationships: Human Gait-Based Recognition , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[24]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[25]  Hermann Ney,et al.  Combination of Tangent Distance and an Image Distortion Model for Appearance-Based Sign Language Recognition , 2005, DAGM-Symposium.

[26]  Rama Chellappa,et al.  "Shape Activity": a continuous-state HMM for moving/deforming shapes with application to abnormal activity detection , 2005, IEEE Transactions on Image Processing.

[27]  Wen Gao,et al.  An approach based on phonemes to large vocabulary Chinese sign language recognition , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[28]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[29]  Michael J. Black,et al.  An Adaptive Appearance Model Approach for Model-based Articulated Object Tracking , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Jiebo Luo,et al.  Robust Color Object Detection Using Spatial-Color Joint Probability Functions , 2004, CVPR.

[31]  HiltonAdrian,et al.  A survey of advances in vision-based human motion capture and analysis , 2006 .

[32]  Hong Li,et al.  Multi-scale gesture recognition from time-varying contours , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[33]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  William T. Freeman,et al.  Orientation Histograms for Hand Gesture Recognition , 1995 .

[35]  Osama Masoud,et al.  A method for human action recognition , 2003, Image Vis. Comput..

[36]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[37]  James C. Bezdek,et al.  Convergence of Alternating Optimization , 2003, Neural Parallel Sci. Comput..

[38]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[39]  Abdesselam Bouzerdoum,et al.  Skin segmentation using color pixel classification: analysis and comparison , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[41]  Narendra Ahuja,et al.  Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  David J. Fleet,et al.  Temporal motion models for monocular and multiview 3D human body tracking , 2006, Comput. Vis. Image Underst..

[43]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[44]  Yaser Sheikh,et al.  Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[45]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[46]  Lihi Zelnik-Manor,et al.  Statistical analysis of dynamic actions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Nikolaus F. Troje,et al.  3D Periodic Human Motion Reconstruction from 2D Motion Sequences , 2007, Neural Computation.

[48]  Agnès Just,et al.  A comparative study of two state-of-the-art sequence processing techniques for hand gesture recognition , 2009, Comput. Vis. Image Underst..

[49]  Martial Hebert,et al.  Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[50]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[51]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[52]  Robert W. Lindeman,et al.  A new instrumented approach for translating American Sign Language into sound and text , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[53]  V C Tartter,et al.  Perceiving minimal distinctions in ASL under normal and point-light display conditions , 1982, Perception & psychophysics.

[54]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[56]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[57]  U. Bellugi,et al.  Perception of American sign language in dynamic point-light displays. , 1981, Journal of experimental psychology. Human perception and performance.