Fast Motion Consistency through Matrix Quantization

Determining the motion consistency between two video clips is a key component for many applications such as video event detection and human pose estimation. Shechtman and Irani recently proposed a method for measuring the motion consistency between two videos by representing the motion about each point with a space-time Harris matrix of spatial and temporal derivatives. A motion-consistency measure can be accurately estimated without explicitly calculating the optical flow from the videos, which could be noisy. However, the motion consistency calculation is computationally expensive and it must be evaluated between all possible pairs of points between the two videos. We propose a novel quantization method for the space-time Harris matrices that reduces the consistency calculation to a fast table lookup for any arbitrary consistency measure. We demonstrate that for the continuous rank drop consistency measure used by Shechtman and Irani, our quantization method is much faster and achieves the same accuracy as the existing approximation.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  B. Julesz Textons, the elements of texture perception, and their interactions , 1981, Nature.

[9]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[12]  Hanno Scharr,et al.  Accurate Optical Flow in Noisy Image Sequences , 2001, ICCV.

[13]  Martial Hebert,et al.  Local detection of occlusion boundaries in video , 2009, Image Vis. Comput..

[14]  Martial Hebert,et al.  Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Gösta H. Granlund,et al.  Optical Flow Based on the Inertia Matrix of the Frequency Domain , 1988 .

[16]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  A. Fathi,et al.  Human Pose Estimation using Motion Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Song-Chun Zhu,et al.  What are Textons? , 2005, Int. J. Comput. Vis..

[19]  Eli Shechtman,et al.  Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them? , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.