Local Temporal Coherence for Object-Aware Keypoint Selection in Video Sequences

Local feature extraction is an important solution for video analysis. The common framework of local feature extraction consists of a local keypoint detector and a keypoint descriptor. Existing keypoint detectors mainly focus on the spatial relationships among pixels, resulting in a large amount of redundant keypoints on background which are often temporally stationary. This paper proposes an object-aware local keypoint selection approach to keep the active keypoints on object and to reduce the redundant keypoints on background by exploring the temporal coherence among successive frames in video. The proposed approach is made up of three local temporal coherence criteria: (1) local temporal intensity coherence; (2) local temporal motion coherence; (3) local temporal orientation coherence. Experimental results on two publicly available datasets show that the proposed approach reduces more than 60% keypoints, which are redundant, and doubles the precision of keypoints.

[1]  Tom Drummond,et al.  Fusing points and lines for high performance tracking , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2]  Stephen M. Smith,et al.  SUSAN—A New Approach to Low Level Image Processing , 1997, International Journal of Computer Vision.

[3]  Keiji Yanai,et al.  Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions , 2009, ACCV.

[4]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[5]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[6]  Narciso García,et al.  Labeled dataset for integral evaluation of moving object detection algorithms: LASIESTA , 2016, Comput. Vis. Image Underst..

[7]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[8]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[10]  Bin Fan,et al.  Local Image Descriptor: Modern Approaches , 2015, SpringerBriefs in Computer Science.

[11]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[12]  Ali Ismail Awad,et al.  Image Feature Detectors and Descriptors , 2016 .

[13]  Stefano Tubaro,et al.  Fast keypoint detection in video sequences , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[15]  Dong-Chul Park,et al.  Centroid neural network with Chi square distance measure for texture classification , 2009, 2009 International Joint Conference on Neural Networks.

[16]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[17]  Paul Beaudet,et al.  Rotationally invariant image operators , 1978 .

[18]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.