论文信息 - Optimised data structures for large scale content-based geo-indexing

Optimised data structures for large scale content-based geo-indexing

Image mining consists of the procedures that allow to access, search and explore very large databases of data. Institutions like spatial agencies have to manage huge archives of Earth Observation (EO) images and need solutions to make data available to users from both the algorithmic and the infrastructural point of views. On the other side, users would need to explore the variety of images not just based on metadata, like time of acquisition or sensor parameters, but also by getting knowledge of their content. In this contribution, we investigate methodologies for content-based EO image retrieval via example-based queries. In particular, we present a procedure for the indexing of large-scale unstructured archives, built on top of a cluster analytics framework, Apache Spark. The procedure is based on a hierarchical and scalable implementation of a space partitioning algorithm and allows O(log n) response query times. Scalability analyses are conducted on polarimetric data from NASA/JPL archives, by using virtualized computing resources distributed over the Internet. In particular, the effects of the cluster size and of the hardware scale-up are demonstrated. The results also reveal the applicative potential of using on-demand cloud-based resources.

Marco Quartulli | Igor G. Olaizola | Pietro Guccione | Giovanni Nico | Luigi Mascolo

[1] Mihai Datcu,et al. A fast compression-based similarity measure with applications to content-based image retrieval , 2012, J. Vis. Commun. Image Represent..

[2] Laurent Amsaleg,et al. Terabyte-scale image similarity search: Experience and best practice , 2013, 2013 IEEE International Conference on Big Data.

[3] Pamela C. Cosman,et al. Tree-structured vector quantization of CT chest scans: image quality and diagnostic accuracy , 1993, IEEE Trans. Medical Imaging.

[4] Marco Pastori,et al. Information mining in remote sensing image archives: system concepts , 2003, IEEE Trans. Geosci. Remote. Sens..

[5] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[6] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[8] Marco Quartulli,et al. A review of EO image information mining , 2012, 1203.0747.

[9] Eric Pottier,et al. A review of target decomposition theorems in radar polarimetry , 1996, IEEE Trans. Geosci. Remote. Sens..