Spatial Aggregation of Visual Features for Image Data Search in a Large Geo-Tagged Image Dataset

Two main requirements of searching in a big image database are performance and accuracy. For an accurate similarity search, high-dimensional visual features are preferred while low-dimensional features resulted from dimension reduction techniques are utilized in index structures for performance. Most state-of-the-art indexes utilize low-dimensional visual descriptors to avoid the computing overhead of high-dimensionality in image search, which sacrifices search accuracy. We propose a new descriptor that balances the trade-off between accuracy and performance in image search by extending the representation of an image with the feature set of multiple similar images located in its vicinity (referred to as Spatially-Aggregated Visual Feature Descriptor (SVD)). SVD potentially preserves the visual features of images in both high and low-dimensional spaces better than conventional visual descriptors. Through an empirical evaluation on big datasets, indexing images using SVD provided a significant improvement in search accuracy comparing to using conventional descriptors while maintaining the same performance.

[1]  David Salesin,et al.  Photographing long scenes with multi-viewpoint panoramas , 2006, ACM Trans. Graph..

[2]  Cyrus Shahabi,et al.  A Deep Learning Approach for Road Damage Detection from Smartphone Images , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[3]  Cyrus Shahabi,et al.  Scalable Spatial Crowdsourcing: A Study of Distributed Algorithms , 2015, 2015 16th IEEE International Conference on Mobile Data Management.

[4]  Shmuel Peleg,et al.  Minimal Aspect Distortion (MAD) Mosaicing of Long Scenes , 2008, International Journal of Computer Vision.

[5]  Cyrus Shahabi,et al.  Geo-Spatial Multimedia Sentiment Analysis in Disasters , 2017, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[6]  Roger Zimmermann,et al.  Viewable scene modeling for geospatial video search , 2008, ACM Multimedia.

[7]  H. Ahmad.,et al.  Determining Sample Size for Research Activities , 2017 .

[8]  Yoshihide Sekimoto,et al.  Road Damage Detection and Classification Using Deep Neural Networks with Smartphone Images , 2018, Comput. Aided Civ. Infrastructure Eng..

[9]  Cyrus Shahabi,et al.  Image Classification to Determine the Level of Street Cleanliness: A Case Study , 2018, 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM).

[10]  J. Hartigan REPRESENTATION OF SIMILARITY MATRICES BY TREES , 1967 .

[11]  Matthew A. Brown,et al.  Recognising panoramas , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Cyrus Shahabi,et al.  TVDP: Translational Visual Data Platform for Smart Cities , 2019, 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW).

[13]  Noah Snavely,et al.  Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Victor S. Lempitsky,et al.  Aggregating Deep Convolutional Features for Image Retrieval , 2015, ArXiv.

[15]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Cyrus Shahabi,et al.  Recognizing Material of a Covered Object: A Case Study With Graffiti , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[17]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[18]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Matthew A. Brown,et al.  Automatic Panoramic Image Stitching using Invariant Features , 2007, International Journal of Computer Vision.

[21]  Hendrik P. A. Lensch,et al.  Transfer Learning for Material Classification using Convolutional Networks , 2016, ArXiv.

[22]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[23]  Cyrus Shahabi,et al.  MediaQ: mobile multimedia management system , 2014, MMSys '14.

[24]  Mubarak Shah,et al.  Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Jerry Zeyu Gao,et al.  An edge-based smart mobile service system for illegal dumping detection and monitoring in San Jose , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[26]  Cyrus Shahabi,et al.  Key Frame Selection Algorithms for Automatic Generation of Panoramic Images from Crowdsourced Geo-tagged Videos , 2014, W2GIS.

[27]  Jiebo Luo,et al.  Event recognition: viewing the world with a third eye , 2008, ACM Multimedia.

[28]  Bharat Bhushan,et al.  A Mutilple-Level Assessment System for Smart City Street Cleanliness , 2018, SEKE.

[29]  Paola Mello,et al.  Image analysis and rule-based reasoning for a traffic monitoring system , 2000, IEEE Trans. Intell. Transp. Syst..

[30]  Olivier Buisson,et al.  Logo retrieval with a contrario visual query expansion , 2009, ACM Multimedia.

[31]  Seon Ho Kim,et al.  Multimedia Sensor Dataset for the Analysis of Vehicle Movement , 2017, MMSys.

[32]  Yimin D. Zhang,et al.  Road Crack Detection Using Deep Convolutional Neural Network and Adaptive Thresholding , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[33]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[34]  M. Tekalp,et al.  Automatic Vehicle Counting from Video for Traffic Flow Analysis , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[35]  He Ma,et al.  GRVS: a georeferenced video search engine , 2009, MM '09.

[36]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[37]  Hervé Jégou,et al.  Local visual query expansion: Exploiting an image collection to refine local descriptors , 2013 .

[38]  Cyrus Shahabi,et al.  Real-Time Traffic Video Analysis Using Intel Viewmont Coprocessor , 2013, DNIS.

[39]  Konstantinos N. Plataniotis,et al.  Aggregation of color and shape features for hybrid query generation in content based visual information retrieval , 2005, Signal Process..

[40]  Hui Cheng,et al.  Video event recognition using concept attributes , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[41]  Cyrus Shahabi,et al.  GeoUGV: user-generated mobile video dataset with fine granularity spatial metadata , 2016, MMSys.

[42]  Nicu Sebe,et al.  Multimedia Event Detection Using A Classifier-Specific Intermediate Representation , 2013, IEEE Transactions on Multimedia.

[43]  Cyrus Shahabi,et al.  Spatial Coverage Measurement of Geo- Tagged Visual Data: A Database Approach , 2018, 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM).

[44]  Jan-Michael Frahm,et al.  Efficient Generation of Multi-perspective Panoramas , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[45]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[46]  Andrew Zisserman,et al.  Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Yoshihide Sekimoto,et al.  Road Damage Detection and Classification Using Deep Neural Networks with Smartphone Images , 2018, Comput. Aided Civ. Infrastructure Eng..

[48]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[49]  Richard Szeliski,et al.  Street slide: browsing street level imagery , 2010, ACM Trans. Graph..

[50]  Cyrus Shahabi,et al.  A Data-Centric Approach for Image Scene Localization , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[51]  Luming Zhang,et al.  Active key frame selection for 3D model reconstruction from crowdsourced geo-tagged videos , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[52]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[53]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[54]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[55]  Mark S. Nixon,et al.  Feature extraction & image processing for computer vision , 2012 .

[56]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[57]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[58]  Cyrus Shahabi,et al.  Hybrid Indexes for Spatial-Visual Search , 2017, ACM Multimedia.