Efficient Self-similarity Range Wide-joins Fostering Near-duplicate Image Detection in Emergency Scenarios

Crowdsourcing information is being increasingly employed to improve and support decision making in emergency situations. However, the gathered records quickly become too similar among themselves and handling several similar reports does not add valuable knowledge to assist the helping personnel at the control center in their decision making tasks. The usual approaches to detect and handle the so-called near-duplicate data rely on costly twofold processing. Aimed at reducing the cost and also improving the ability of duplication detection, we developed a framework model based on the similarity wide-join database operator. We extended the wide-join definition empowering it to surpass its restrictions and accomplish the near-duplicate task too. In this paper, we also provide an efficient algorithm based on pivots that speeds up the entire process, which enables retrieving the top similar elements in a single-pass processing. Experiments using real datasets show that our framework is up to three orders of magnitude faster than the competing techniques in the literature, whereas also improving the quality of the result in about 35 percent.

[1]  lawa Kanas,et al.  Metric Spaces , 2020, An Introduction to Functional Analysis.

[2]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[3]  Milan Sonka,et al.  Image Processing, Analysis and Machine Vision , 1993, Springer US.

[4]  José Fernando Rodrigues,et al.  BoWFire: Detection of Fire in Still Images by Integrating Pixel Color and Texture Analysis , 2015, 2015 28th SIBGRAPI Conference on Graphics, Patterns and Images.

[5]  Wei-Ying Ma,et al.  Duplicate-Search-Based Image Annotation Using Web-Scale Data , 2012, Proceedings of the IEEE.

[6]  Walid G. Aref,et al.  Similarity queries: their conceptual evaluation, transformations, and processing , 2013, The VLDB Journal.

[7]  Bing Yang,et al.  Near-Duplicate Image Retrieval Based on Contextual Descriptor , 2015, IEEE Signal Processing Letters.

[8]  Akio Yamada,et al.  The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[9]  Yuan Yan Tang,et al.  Mining near duplicate image groups , 2014, Multimedia Tools and Applications.

[10]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near-duplicate detection , 2011, TODS.

[11]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[12]  Shaun Bangay,et al.  Evaluating Locality Sensitive Hashing for Matching Partial Image Patches in a Social Media Setting , 2014, J. Multim..