A Distributed CBIR System Based on Improved SURF on Apache Spark

This paper investigates the problem of image retrieval in abundant volume of image data. We propose an improved Content Based Image Retrieval (CBIR) system based on Apache Spark, a lightning-fast engine of cluster computing for large-scale data processing, to overcome the shortcomings in retrieval speed and accuracy. Specifically, binary descriptors, which consume less memory and accelerate the retrieval speed, are built through uniform sampling patterns in Binary Robust Invariant Scalable Keypoints (BRISK) to represent images instead of floating-number descriptors in the original SURF. Then we eliminate the mismatched point pairs with Random Sample Consensus (RANSAC) in the pre-matching point pairs to further improve the accuracy of the retrieval. Experimental results show that the proposed system significantly improves both the retrieval speed and accuracy compared to traditional CBIR systems.

[1]  Wichian Premchaiswadi,et al.  A parallel processing framework using MapReduce for content-based image retrieval , 2013, 2013 Eleventh International Conference on ICT and Knowledge Engineering.

[2]  Hoi-Jun Yoo,et al.  An 1.61mW mixed-signal column processor for BRISK feature extraction in CMOS image sensor , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[3]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[4]  Yong Ho Moon,et al.  An enhanced SURF algorithm based on new interest point detection procedure and fast computation technique , 2016, Journal of Real-Time Image Processing.

[5]  SuGil Choi,et al.  New binary descriptors based on BRISK sampling pattern for image retrieval , 2014, 2014 International Conference on Information and Communication Technology Convergence (ICTC).

[6]  Mayank Tiwary,et al.  Efficient implementation of apriori algorithm on HDFS using GPU , 2014, 2014 International Conference on High Performance Computing and Applications (ICHPCA).

[7]  Gehao Sheng,et al.  An Integrated Data Preprocessing Framework Based on Apache Spark for Fault Diagnosis of Power Grid Equipment , 2017, J. Signal Process. Syst..

[8]  Yanchun Zhang,et al.  An overview of content-based image retrieval techniques , 2004, 18th International Conference on Advanced Information Networking and Applications, 2004. AINA 2004..

[9]  Sabu Emmanuel,et al.  Improving SURF Based Copy-Move Forgery Detection Using Super Resolution , 2016, 2016 IEEE International Symposium on Multimedia (ISM).

[10]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  Michael Jump,et al.  Equivalence of BRISK Descriptors for the Registration of Variable Bit-Depth Aerial Imagery , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[12]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[13]  Ethan Rublee,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[14]  Rong Gu,et al.  Cichlid: Efficient Large Scale RDFS/OWL Reasoning with Spark , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[15]  Marco Tagliasacchi,et al.  Briskola: BRISK optimized for low-power ARM architectures , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[16]  Rafal Grycuk,et al.  Video Key Frame Detection Based on SURF Algorithm , 2015, ICAISC.

[17]  Wenqiu Zhu,et al.  An Improved RANSAC Algorithm Based on Similar Structure Constraints , 2016, 2016 International Conference on Robots & Intelligent System (ICRIS).

[18]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.