Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition

We present an algorithm for extracting and classifying two-dimensional motion in an image sequence based on motion trajectories. First, a multiscale segmentation is performed to generate homogeneous regions in each frame. Regions between consecutive frames are then matched to obtain two-view correspondences. Affine transformations are computed from each pair of corresponding regions to define pixel matches. Pixels matches over consecutive image pairs are concatenated to obtain pixel-level motion trajectories across the image sequence. Motion patterns are learned from the extracted trajectories using a time-delay neural network. We apply the proposed method to recognize 40 hand gestures of American Sign Language. Experimental results show that motion patterns of hand gestures can be extracted and recognized accurately using motion trajectories.

[1]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[2]  Robert E. Tarjan,et al.  Isomorphism of Planar Graphs , 1972, Complexity of Computer Computations.

[3]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Raj Reddy,et al.  Matching Segments of Images , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  William B. Thompson,et al.  TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2009 .

[8]  U. Bellugi,et al.  Perception of American sign language in dynamic point-light displays. , 1981, Journal of experimental psychology. Human perception and performance.

[9]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[10]  Hans-Hellmut Nagel Displacement vectors derived from second order intensity variations in image sequences , 1982, Comput. Graph. Image Process..

[11]  Ramesh C. Jain,et al.  Detection of moving edges , 1982, Comput. Graph. Image Process..

[12]  Ramesh C. Jain,et al.  Detection of moving edges , 1983, Comput. Vis. Graph. Image Process..

[13]  Hans-Hellmut Nagel,et al.  Displacement vectors derived from second-order intensity variations in image sequences , 1983, Comput. Vis. Graph. Image Process..

[14]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[15]  Michael S. Landy,et al.  Intelligible encoding of ASL image sequences at extremely low information rates , 1985, Comput. Vis. Graph. Image Process..

[16]  Jake K. Aggarwal,et al.  On the computation of motion from sequences of images-A review , 1988, Proc. IEEE.

[17]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[18]  Ruzena Bajcsy,et al.  Segmentation as the search for the best description of the image in terms of primitives , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[19]  A. Verri,et al.  Differential techniques for optical flow , 1990 .

[20]  Narendra Ahuja,et al.  Matching Two Perspective Views , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Patrick Bouthemy,et al.  Multimodal Estimation of Discontinuous Optical Flow using Markov Random Fields , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Geoffrey E. Hinton,et al.  Glove-Talk: a neural network interface between a data-glove and a speech synthesizer , 1993, IEEE Trans. Neural Networks.

[24]  Narendra Ahuja,et al.  Integrated 3-D Analysis and Analysis-Guided Synthesis of Flight Image Sequences , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Edward Hunter,et al.  Vision based hand gesture interpretation using recursive estimation , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[26]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[27]  Narendra Ahuja,et al.  A Transform for Multiscale Image Segmentation by Integrated Edge and Region Detection , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Mark D. Tabb,et al.  Multiscale structure detection and its application to image segmentation and motion analysis , 1996 .

[29]  Edward H. Adelson,et al.  A unified mixture framework for motion segmentation: incorporating spatial coherence and estimating the number of models , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Jeffrey Mark Siskind,et al.  A Maximum-Likelihood Approach to Visual Event Classification , 1996, ECCV.

[31]  Narendra Ahuja,et al.  Multiscale image segmentation by integrated edge and region detection , 1997, IEEE Trans. Image Process..

[32]  Geoffrey E. Hinton,et al.  Glove-talk II - a neural-network interface which maps gestures to parallel formant speech synthesizer controls , 1997, IEEE Trans. Neural Networks.

[33]  James L. Crowley,et al.  Multi-modal tracking of faces for video communications , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Jake K. Aggarwal,et al.  Human motion analysis: a review , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[35]  Mubarak Shah,et al.  Motion-Based Recognition , 1997, Computational Imaging and Vision.

[36]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Aaron F. Bobick,et al.  A State-Based Approach to the Representation and Recognition of Gesture , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Xindong Wu,et al.  RIEVL: Recursive Induction Learning in Hand Gesture Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Alex Pentland,et al.  Computer Vision for Human–Machine Interaction: Acknowledgements , 1998 .

[40]  Geoffrey E. Hinton,et al.  Glove-TalkII-a neural-network interface which maps gestures to parallel formant speech synthesizer controls , 1997, IEEE Trans. Neural Networks.

[41]  Dimitris N. Metaxas,et al.  ASL recognition based on a coupling between HMMs and 3D motion analysis , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[42]  Michael J. Black,et al.  A Probabilistic Framework for Matching Temporal Trajectories: CONDENSATION-Based Recognition of Gestures and Expressions , 1998, ECCV.

[43]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Yuntao Cui,et al.  A Learning-Based Prediction-and-Verification Segmentation Scheme for Hand Sign Image Sequence , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[48]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[49]  Ralph R. Martin,et al.  Robust Segmentation of Primitives from Range Data in the Presence of Geometric Degeneracy , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Dimitris N. Metaxas,et al.  A Framework for Recognizing the Simultaneous Aspects of American Sign Language , 2001, Comput. Vis. Image Underst..

[51]  Narendra Ahuja,et al.  Face Detection and Gesture Recognition for Human-Computer Interaction , 2001, The International Series in Video Computing.

[52]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.