Optimizing Sentinel-2 image selection in a Big Data context

Abstract Processing large amounts of image data such as the Sentinel-2 archive is a computationally demanding task. However, for most applications, many of the images in the archive are redundant and do not contribute to the quality of the final result. An optimization scheme is presented here that selects a subset of the Sentinel-2 archive in order to reduce the amount of processing, while retaining the quality of the resulting output. As a case study, we focused on the creation of a cloud-free composite, covering the global land mass and based on all the images acquired from January 2016 until September 2017. The total amount of available images was 2,128,556. The selection of the optimal subset was based on quicklooks, which correspond to a spatial and spectral subset of the original Sentinel-2 products and are lossy compressed. The selected subset contained 94,093 image tiles in total, reducing the amount of images to be processed to 4.42% of the full set.

[1]  Pierre Soille,et al.  Morphological Image Analysis: Principles and Applications , 2003 .

[2]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[3]  Soille Pierre,et al.  Towards a JRC Earth Observation Data and Processing Platform , 2016 .

[4]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[5]  Yonggang Wen,et al.  Data Center Energy Consumption Modeling: A Survey , 2016, IEEE Communications Surveys & Tutorials.

[6]  Andreas J. Peters,et al.  CERNBox + EOS: end-user storage for science , 2015 .

[7]  Pierre Soille,et al.  Assessment of the Added-Value of Sentinel-2 for Detecting Built-up Areas , 2016, Remote. Sens..

[8]  Kempeneers Pieter,et al.  The JRC Earth Observation Data and Processing Platform , 2017 .

[9]  Lorraine Remer,et al.  The MODIS 2.1-μm channel-correlation with visible reflectance for use in remote sensing of aerosol , 1997, IEEE Trans. Geosci. Remote. Sens..

[10]  W. Jetz,et al.  Remotely Sensed High-Resolution Global Cloud Dynamics for Predicting Ecosystem and Biodiversity Distributions , 2016, PLoS biology.

[11]  Amip J. Shah,et al.  Assessing the environmental impact of data centres part 1: Background, energy use and metrics , 2014 .

[12]  Pierre Soille,et al.  A versatile data-intensive computing platform for information retrieval from big geospatial data , 2018, Future Gener. Comput. Syst..

[13]  Matthias Drusch,et al.  Sentinel-2: ESA's Optical High-Resolution Mission for GMES Operational Services , 2012 .