Spark-SIFT: A Spark-Based Large-Scale Image Feature Extract System

The feature extraction is critical step in the image processing, with the popularity of the content-based image retrieval, how to extract the feature of the big-scale images quickly is become the very important and significant. In many big data dealing frameworks, spark is a memory based data processing framework with obvious advantages over processing speed. In this paper, we design a large-scale image feature extract framework based in spark. The framework contains three part,1) the base interface of image processing, 2) the sift algorithm in the spark. 3) The sequence of images. The problem of load unbalance will happened when the sizes of images to deal have wide difference, so to solve this problem, we propose the segmentationimage feature extract algorithm in the spark. In the algorithm, the big image is segmented to several parts for the more fast dealing speed. The experiment shows the framework has well speed compared with the single. When dealing the images which sizes is 4g in 7 machine, the speed reaches about 19.5. The segmentation-image feature extraction algorithm improves speed by 7.8 times when dealing 480M image set.

[1]  Jun Wu,et al.  Accelerating Large-scale Image Retrieval on Heterogeneous Architectures with Spark , 2015, ACM Multimedia.

[2]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Bernd Girod,et al.  Large-Scale Query-by-Image Video Retrieval Using Bloom Filters , 2016, ArXiv.

[4]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[5]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[6]  João Leitão,et al.  Privacy-Preserving Content-Based Image Retrieval in the Cloud , 2014, 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS).

[7]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Reynold Xin,et al.  SparkR: Scaling R Programs with Spark , 2016, SIGMOD Conference.

[10]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Di Wu,et al.  Image Texture Feature Extraction Based on Hadoop Cloud Platform and New ImageClass , 2015 .

[13]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[14]  Fei Wang,et al.  Real-time large scale near-duplicate web video retrieval , 2010, ACM Multimedia.

[15]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[16]  Lihi Zelnik-Manor,et al.  SIFTpack: A Compact Representation for Efficient SIFT Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Thomas Martin Deserno,et al.  Feature description with SIFT, SURF, BRIEF, BRISK, or FREAK? A general question answered for bone age assessment , 2016, Comput. Biol. Medicine.

[18]  Jun Wu,et al.  CHCF: A Cloud-Based Heterogeneous Computing Framework for Large-Scale Image Retrieval , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  GhemawatSanjay,et al.  The Google file system , 2003 .

[20]  Jan-Michael Frahm,et al.  Comparative Evaluation of Binary Features , 2012, ECCV.

[21]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[22]  Jaehong Park,et al.  DeepSpark: Spark-Based Deep Learning Supporting Asynchronous Updates and Caffe Compatibility , 2016, ArXiv.

[23]  Mohammed Javed,et al.  An efficient method for video shot boundary detection and keyframe extraction using SIFT-point distribution histogram , 2016, International Journal of Multimedia Information Retrieval.

[24]  Xiao Lin,et al.  An Improved Content Based Image Retrieval System On Apache Spark , 2016 .

[25]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[26]  Tasneem Mirza,et al.  Content based Image Retrieval using Color and Texture , 2016 .

[27]  Jen-Hao Hsiao,et al.  Deep learning of binary hash codes for fast image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..