Hybrid coding of visual content and local image features

Distributed visual analysis applications, such as mobile visual search or Visual Sensor Networks (VSNs) require the transmission of visual content on a bandwidth-limited network, from a peripheral node to a processing unit. Traditionally, a “Compress-Then-Analyze” approach has been pursued, in which sensing nodes acquire and encode the pixel-level representation of the visual content, that is subsequently transmitted to a sink node in order to be processed. This approach might not represent the most effective solution, since several analysis applications leverage a compact representation of the content, thus resulting in an inefficient usage of network resources. Furthermore, coding artifacts might significantly impact the accuracy of the visual task at hand. To tackle such limitations, an orthogonal approach named “Analyze-Then-Compress” has been proposed [1]. According to such a paradigm, sensing nodes are responsible for the extraction of visual features, that are encoded and transmitted to a sink node for further processing. In spite of improved task efficiency, such paradigm implies the central processing node not being able to reconstruct a pixel-level representation of the visual content. In this paper we propose an effective compromise between the two paradigms, namely “Hybrid-Analyze-Then-Compress” (HATC) that aims at jointly encoding visual content and local image features. Furthermore, we show how a target tradeoff between image quality and task accuracy might be achieved by accurately allocating the bitrate to either visual content or local features.

[1]  João Ascenso,et al.  Coding binary local features extracted from video sequences , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[2]  Eckehard G. Steinbach,et al.  Preserving SIFT features in JPEG-encoded images , 2011, 2011 18th IEEE International Conference on Image Processing.

[3]  Pierre Moulin,et al.  A two-part predictive coder for multitask signal compression , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Marco Tagliasacchi,et al.  Compress-then-analyze vs. analyze-then-compress: Two paradigms for image analysis in visual sensor networks , 2013, 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP).

[5]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cedric Nishan Canagarajah,et al.  Multiple Priority Region of Interest Coding with H.264 , 2006, 2006 International Conference on Image Processing.

[7]  Stefano Tubaro,et al.  Coding video sequences of visual features , 2013, 2013 IEEE International Conference on Image Processing.

[8]  M. Cesana,et al.  Binary local descriptors based on robust hashing , 2013 .

[9]  Bernd Girod,et al.  Location coding for mobile image retrieval , 2009, Mobimedia 2009.

[10]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[11]  Vincent Lepetit,et al.  Boosting Binary Keypoint Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Bernd Girod,et al.  Location coding for mobile image retrieval , 2009, MobiMedia.

[13]  Bernd Girod,et al.  Interframe Coding of Canonical patches for Low Bit-rate Mobile Augmented Reality , 2013, Int. J. Semantic Comput..

[14]  Stefano Tubaro,et al.  Coding Visual Features Extracted From Video Sequences , 2014, IEEE Transactions on Image Processing.

[15]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[16]  Wen-Nung Lie,et al.  Region-of-interest based rate control scheme with flexible quality on demand , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[17]  Bernd Girod,et al.  Improved coding for image feature location information , 2012, Other Conferences.

[18]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[19]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[20]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[21]  João Ascenso,et al.  Rate-accuracy optimization of binary descriptors , 2013, 2013 IEEE International Conference on Image Processing.

[22]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[23]  Marco Tagliasacchi,et al.  Bamboo: A fast descriptor based on AsymMetric pairwise BOOsting , 2014, 2014 IEEE International Conference on Image Processing (ICIP).