A Novel Rate Control Framework for SIFT/SURF Feature Preservation in H.264/AVC Video Compression

This paper presents a novel rate control framework for H.264/Advanced Video Coding-based video coding that improves the preservation of gradient-based features like scale-invariant feature transform or speeded up robust feature compared with the default rate control algorithm in the JM reference software. First, a criterion (matching score) for feature preservation on the basis of the bag-of-features concept is proposed. Then, the matching scores are collected as a function of the quantization parameters and analyzed for different feature types. With this analysis, macroblocks are categorized into different groups before encoding. Our rate control algorithm assigns different quantization parameters to each group according to the importance of the group for feature extraction. The experimental results show that our rate control algorithm achieves the desired target bit rate, and more features are preserved compared with videos encoded using the default rate control. The proposed approach not only improves feature preservation, but also leads to a noticeable performance improvement in a real image retrieval system. The rate control framework proposed in this paper is fully standard compatible.

[1]  Stefano Tubaro,et al.  Coding Visual Features Extracted From Video Sequences , 2014, IEEE Transactions on Image Processing.

[2]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[3]  Cedric Nishan Canagarajah,et al.  Multiple Priority Region of Interest Coding with H.264 , 2006, 2006 International Conference on Image Processing.

[4]  Homer H. Chen,et al.  SSIM-Based Perceptual Rate Control for Video Coding , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Hu Chen,et al.  On the design of a novel JPEG quantization table for improved feature detection performance , 2013, 2013 IEEE International Conference on Image Processing.

[6]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[7]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Bernd Girod,et al.  Interframe Coding of Canonical patches for Low Bit-rate Mobile Augmented Reality , 2013, Int. J. Semantic Comput..

[10]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[12]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[13]  Eckehard G. Steinbach,et al.  SIFT feature-preserving bit allocation for H.264/AVC video compression , 2012, 2012 19th IEEE International Conference on Image Processing.

[14]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[15]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Eckehard G. Steinbach,et al.  Preserving SIFT features in JPEG-encoded images , 2011, 2011 18th IEEE International Conference on Image Processing.

[17]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[18]  Wei Li,et al.  Region-Based Rate Control for H.264/AVC for Low Bit-Rate Applications , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Susanto Rahardja,et al.  Adaptive rate control for H.264 , 2004, ICIP.

[20]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[21]  Anthony Vetro,et al.  MPEG-4 rate control for multiple video objects , 1999, IEEE Trans. Circuits Syst. Video Technol..

[22]  Marco Tagliasacchi,et al.  Compress-then-analyze vs. analyze-then-compress: Two paradigms for image analysis in visual sensor networks , 2013, 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP).

[23]  Bernd Girod,et al.  Compression of image patches for local feature extraction , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[25]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[26]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[27]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, CVPR.

[28]  Eckehard G. Steinbach,et al.  Rapid image retrieval for mobile location recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  King Ngi Ngan,et al.  A two-pass rate control algorithm for H.264/AVC high definition video coding , 2009, Signal Process. Image Commun..

[30]  Wen-Nung Lie,et al.  Region-of-interest based rate control scheme with flexible quality on demand , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[31]  Eckehard G. Steinbach,et al.  Performance comparison of various feature detector-descriptor combinations for content-based image retrieval with JPEG-encoded query images , 2013, 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP).

[32]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[33]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[34]  T. Wiegand,et al.  Text Description of Joint Model Reference Encoding Methods and Decoding Concealment Methods , 2004 .

[35]  Bernd Girod,et al.  Gradient preserving quantization , 2012, 2012 19th IEEE International Conference on Image Processing.

[36]  João Ascenso,et al.  Rate-accuracy optimization of binary descriptors , 2013, 2013 IEEE International Conference on Image Processing.

[37]  Jiri Matas,et al.  Epipolar geometry estimation via RANSAC benefits from the oriented epipolar constraint , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[38]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[40]  Wen Gao,et al.  Optimizing JPEG quantization table for low bit rate mobile visual search , 2012, 2012 Visual Communications and Image Processing.

[41]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.