Big earth observation data analytics: matching requirements to system architectures

Earth observation satellites produce petabytes of geospatial data. To manage large data sets, researchers need stable and efficient solutions that support their analytical tasks. Since the technology for big data handling is evolving rapidly, researchers find it hard to keep up with the new developments. To lower this burden, we argue that researchers should not have to convert their algorithms to specialised environments. Imposing a new API to researchers is counterproductive and slows down progress on big data analytics. This paper assesses the cost of research-friendliness, in a case where the researcher has developed an algorithm in the R language and wants to use the same code for big data analytics. We take an algorithm for remote sensing time series analysis on compare it use on map/reduce and on array database architectures. While the performance of the algorithm for big data sets is similar, organising image data for processing in Hadoop is more complicated and time-consuming than handling images in SciDB. Therefore, the combination of the array database SciDB and the R language offers an adequate support for researchers working on big Earth observation data analytics.

[1]  Michael Stonebraker,et al.  EarthDB: scalable analysis of MODIS data using SciDB , 2012, BigSpatial '12.

[2]  Damien Sulla-Menashe,et al.  MODIS Collection 5 global land cover: Algorithm refinements and characterization of new datasets , 2010 .

[3]  Stephen V. Stehman,et al.  International Journal of Applied Earth Observation and Geoinformation: Time-Series Analysis of Multi-Resolution Optical Imagery for Quantifying Forest Cover Loss in Sumatra and Kalimantan, Indonesia , 2011 .

[4]  Armel Thibaut Kaptué Tchuenté,et al.  Comparison and relative quality assessment of the GLC2000, GLOBCOVER, MODIS and ECOCLIMAP land cover data sets at the African continental scale , 2011, Int. J. Appl. Earth Obs. Geoinformation.

[5]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[6]  J. Mustard,et al.  Wavelet analysis of MODIS time series to detect expansion and intensification of row-crop agriculture in Brazil , 2008 .

[7]  David Thau,et al.  Google Earth Engine , 2015 .

[8]  Zhiqiang Yang,et al.  Detecting trends in forest disturbance and recovery using yearly Landsat time series: 1. LandTrendr — Temporal segmentation algorithms , 2010 .

[9]  Paul G. Brown,et al.  Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.

[10]  A. Strahler,et al.  Monitoring vegetation phenology using MODIS , 2003 .

[11]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[12]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.

[13]  Ahmed Eldawy,et al.  SpatialHadoop: A MapReduce framework for spatial data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[14]  Rob J Hyndman,et al.  Detecting trend and seasonal changes in satellite image time series , 2010 .

[15]  Peter Baumann,et al.  The multidimensional database system RasDaMan , 1998, SIGMOD '98.

[16]  Ranga Raju Vatsavai,et al.  Spatiotemporal data mining in the era of big spatial data: algorithms and applications , 2012, BigSpatial '12.

[17]  C. Justice,et al.  High-Resolution Global Maps of 21st-Century Forest Cover Change , 2013, Science.

[18]  Nuno Constantino Castro,et al.  Time Series Data Mining , 2009, Encyclopedia of Database Systems.

[19]  Shen-Shyang Ho,et al.  A SciDB-based Framework for Efficient Satellite Data Storage and Query based on Dynamic Atmospheric Event Trajectory , 2015, BigSpatial@SIGSPATIAL.

[20]  Gilberto Câmara,et al.  A Time-Weighted Dynamic Time Warping Method for Land-Use and Land-Cover Mapping , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[21]  S. Nilsson,et al.  A spatial comparison of four satellite derived 1 km global land cover datasets , 2006 .

[22]  François Petitjean,et al.  Satellite Image Time Series Analysis Under Time Warping , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[23]  Stefano Nativi,et al.  Big Data challenges in building the Global Earth Observation System of Systems , 2015, Environ. Model. Softw..

[24]  W. Verhoef,et al.  Reconstructing cloudfree NDVI composites using Fourier analysis of time series , 2000 .

[25]  Patrick Hostert,et al.  A Pixel-Based Landsat Compositing Algorithm for Large Area Land Cover Mapping , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[26]  Michael Stonebraker,et al.  SciDB: A Database Management System for Applications with Complex Analytics , 2013, Computing in Science & Engineering.

[27]  Per Jönsson,et al.  TIMESAT - a program for analyzing time-series of satellite sensor data , 2004, Comput. Geosci..

[28]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[29]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.