Big Earth data analytics: a survey

ABSTRACT Big Earth data are produced from satellite observations, Internet-of-Things, model simulations, and other sources. The data embed unprecedented insights and spatiotemporal stamps of relevant Earth phenomena for improving our understanding, responding, and addressing challenges of Earth sciences and applications. In the past years, new technologies (such as cloud computing, big data and artificial intelligence) have gained momentum in addressing the challenges of using big Earth data for scientific studies and geospatial applications historically intractable. This paper reviews the big Earth data analytics from several aspects to capture the latest advancements in this fast-growing domain. We first introduce the concepts of big Earth data. The architecture, various functionalities, and supporting modules are then reviewed from a generic methodology aspect. Analytical methods supporting the functionalities are surveyed and analyzed in the context of different tools. The driven questions are exemplified through cutting-edge Earth science researches and applications. A list of challenges and opportunities are proposed for different stakeholders to collaboratively advance big Earth data analytics in the near future.

[1]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[2]  Dawn J. Wright,et al.  Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Support , 2016 .

[3]  Dursun Delen,et al.  Investigating injury severity risk factors in automobile crashes with predictive analytics and sensitivity analysis methods , 2017 .

[4]  Jie Tian,et al.  Automated human mobility mode detection based on GPS tracking data , 2014, 2014 22nd International Conference on Geoinformatics.

[5]  Ben Collen,et al.  Biodiversity Monitoring and Conservation: Bridging the Gaps Between Global Commitment and Local Action , 2013 .

[6]  Han Qin,et al.  A Planetary Defense Gateway for Smart Discovery of relevant Information for Decision Support , 2017 .

[7]  Jianfeng Wang,et al.  GPU Solutions to Multi-scale Problems in Science and Engineering , 2011 .

[8]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[9]  Zhenlong Li,et al.  Building Model as a Service to support geosciences , 2017, Comput. Environ. Urban Syst..

[10]  Ishita Chakraborty,et al.  A Hybrid Clustering Algorithm for Fire Detection in Video and Analysis with Color Based Thresholding Method , 2010, 2010 International Conference on Advances in Computer Engineering.

[11]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jiawei Han,et al.  Spatial clustering methods in data mining , 2001 .

[13]  K. Gaston,et al.  Contrasting trends in light pollution across Europe based on satellite observed night time lights , 2014, Scientific Reports.

[14]  J. Townshend,et al.  A long-term Global LAnd Surface Satellite (GLASS) data-set for environmental studies , 2013 .

[15]  Kuolin Hsu,et al.  Computational Earth Science: Big Data Transformed Into Insight , 2013 .

[16]  Feng Luo,et al.  Accelerating big data analytics on HPC clusters using two-level storage , 2017, Parallel Comput..

[17]  An overview of land evaluation and land use planning , 2011 .

[18]  Simone Franceschini,et al.  An ecologically constrained procedure for sensitivity analysis of Artificial Neural Networks and other empirical models , 2019, PloS one.

[19]  Gunasekaran Manogaran,et al.  RETRACTED ARTICLE: A big data classification approach using LDA with an enhanced SVM method for ECG signals in cloud computing , 2017, Multimedia Tools and Applications.

[20]  Nicolas Theys,et al.  Satellite detection, long‐range transport, and air quality impacts of volcanic sulfur dioxide from the 2014–2015 flood lava eruption at Bárðarbunga (Iceland) , 2015 .

[21]  Fawzi Mohamed,et al.  Towards efficient data exchange and sharing for big-data driven materials science: metadata and data formats , 2017, npj Computational Materials.

[22]  Ben Lewis,et al.  The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources , 2017 .

[23]  A. Bolten,et al.  INTRODUCING A LOW-COST MINI-UAV FOR THERMAL- AND MULTISPECTRAL-IMAGING , 2012 .

[24]  Krzysztof Janowicz,et al.  Extracting and understanding urban areas of interest using geotagged photos , 2015, Comput. Environ. Urban Syst..

[25]  Roberta E. Martin,et al.  Carnegie Airborne Observatory-2: Increasing science data dimensionality via high-fidelity multi-sensor fusion , 2012 .

[26]  Nilanjan Dey,et al.  Big Data for Remote Sensing: Visualization, Analysis and Interpretation: Digital Earth and Smart Earth , 2019 .

[27]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[28]  Gregory G. Leptoukh,et al.  Online analysis enhances use of NASA Earth science data , 2007 .

[29]  Jia Zhang,et al.  Ontology-Based Workflow Generation for Intelligent Big Data Analytics , 2015, 2015 IEEE International Conference on Web Services.

[30]  Dorian Gorgan,et al.  Filling the gap between Earth observation and policy making in the Black Sea catchment with enviroGRIDS , 2015 .

[31]  Jens Nieke,et al.  Spatial PSF Nonuniformity Effects in Airborne Pushbroom Imaging Spectrometry Data , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[32]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[33]  Yuriy V. Kostyuchenko,et al.  Geostatistics and Remote Sensing for Extremes Forecasting and Disaster Risk Multiscale Analysis , 2015 .

[34]  Francesca Romana Cinti,et al.  The 2011 Tohoku (Japan) Tsunami Inundation and Liquefaction Investigated Through Optical, Thermal, and SAR Data , 2013, IEEE Geoscience and Remote Sensing Letters.

[35]  Nancy Wiegand,et al.  A Task‐Based Ontology Approach to Automate Geospatial Data Retrieval , 2007, Trans. GIS.

[36]  Sujay V. Kumar,et al.  Multiscale assimilation of Advanced Microwave Scanning Radiometer–EOS snow water equivalent and Moderate Resolution Imaging Spectroradiometer snow cover fraction observations in northern Colorado , 2012 .

[37]  J. Fung,et al.  Using satellite remote sensing data to estimate the high-resolution distribution of ground-level PM2.5 , 2015 .

[38]  Ramanathan Sugumaran,et al.  Big 3D spatial data processing using cloud computing environment , 2012, BigSpatial '12.

[39]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[40]  Graham J. Borradaile,et al.  Statistics of Earth Science Data: Their Distribution in Time, Space and Orientation , 2003 .

[41]  C. Wanga,et al.  EFFICIENT LIDAR POINT CLOUD DATA MANAGING AND PROCESSING IN A HADOOP-BASED DISTRIBUTED FRAMEWORK , 2017 .

[42]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[43]  Jizhe Xia,et al.  Polar CI Portal: A Cloud-Based Polar Resource Discovery Engine , 2016, CloudCom 2016.

[44]  Jie Li,et al.  Experience-based rule base generation and adaptation for fuzzy interpolation , 2016, 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[45]  F. Binkowski,et al.  Models-3 community multiscale air quality (cmaq) model aerosol component , 2003 .

[46]  Marta Wlodarczyk-Sielicka,et al.  Self-organizing Artificial Neural Networks into Hydrographic Big Data Reduction Process , 2014, RSEISP.

[47]  N. Pettorelli,et al.  Satellite remote sensing for applied ecologists: opportunities and challenges , 2014 .

[48]  Gilberto Câmara,et al.  Big earth observation data analytics: matching requirements to system architectures , 2016, BigSpatial '16.

[49]  Huadong Guo,et al.  Earth observation satellite sensors for biodiversity monitoring: potentials and bottlenecks , 2014 .

[50]  Anthony Sulistio,et al.  Private cloud for collaboration and e-Learning services: from IaaS to SaaS , 2010, Computing.

[51]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[52]  Carl E. Brown,et al.  A Review of Oil Spill Remote Sensing , 2017, Sensors.

[53]  Damaris Zurell,et al.  Integrating movement ecology with biodiversity research - exploring new avenues to address spatiotemporal biodiversity dynamics , 2013, Movement Ecology.

[54]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[55]  Subhransu Maji,et al.  Deep filter banks for texture recognition and segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  K. Taylor Summarizing multiple aspects of model performance in a single diagram , 2001 .

[57]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[58]  Simon Plank,et al.  Rapid Damage Assessment by Means of Multi-Temporal SAR - A Comprehensive Review and Outlook to Sentinel-1 , 2014, Remote. Sens..

[59]  Edward A. Fox,et al.  Social media use by government: from the routine to the critical , 2011, dg.o '11.

[60]  B. Santer,et al.  Statistical significance of climate sensitivity predictors obtained by data mining , 2014 .

[61]  Biswajeet Pradhan,et al.  A New Semiautomated Detection Mapping of Flood Extent From TerraSAR-X Satellite Image Using Rule-Based Classification and Taguchi Optimization Techniques , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[62]  J. Guldmann,et al.  Spatial statistical analysis and simulation of the urban heat island in high-density central cities , 2014 .

[63]  Steve Kempler,et al.  Earth Science Data Analytics: Definitions, Techniques and Skills , 2017 .

[64]  Michael Dixon,et al.  Google Earth Engine: Planetary-scale geospatial analysis for everyone , 2017 .

[65]  Frederick G. Shuman,et al.  History of Numerical Weather Prediction at the National Meteorological Center , 1989 .

[66]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[67]  James H. Faghmous,et al.  A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science , 2014, Big Data.

[68]  L. Shawn Matott,et al.  Evaluating uncertainty in integrated environmental models: A review of concepts and tools , 2009 .

[69]  Han Qin,et al.  An architecture for mitigating near earth object's impact to the earth , 2017, 2017 IEEE Aerospace Conference.

[70]  Jianping Wu,et al.  Automated derivation of urban building density information using airborne LiDAR data and object-based method , 2010 .

[71]  William L. Stefanov,et al.  Micro-scale urban surface temperatures are related to land-cover features and residential heat related health impacts in Phoenix, AZ USA , 2015, Landscape Ecology.

[72]  Andy Nelson Crop pests: Crop-health survey aims to fill data gaps. , 2017, Nature.

[73]  Xingrui Yu,et al.  Deep learning in remote sensing scene classification: a data augmentation enhanced convolutional neural network framework , 2017 .

[74]  Y. G. Kravchenko,et al.  The Convolution Neural Network for automatic objects detection in Earth satellite imagery , 2018 .

[75]  Yunhuai Liu,et al.  The big data analytics and applications of the surveillance system using video structured description technology , 2016, Cluster Computing.

[76]  Pierre Soille,et al.  Extracting building stock information from optical satellite imagery for mapping earthquake exposure and its vulnerability , 2013, Natural Hazards.

[77]  Sharath Chandra Guntuku,et al.  Big Data Analytics framework for Peer-to-Peer Botnet detection using Random Forests , 2014, Inf. Sci..

[78]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[79]  D. Ivanova,et al.  Scientific Computing and Big Data Analytics: Application in Climate Science , 2017 .

[80]  Liang Guo,et al.  Impacts of 20th century aerosol emissions on the South Asian monsoon in the CMIP5 models , 2014 .

[81]  Toshihiro Kujirai,et al.  Mapping Crop Status from AN Unmanned Aerial Vehicle for Precision Agriculture Applications , 2012 .

[82]  Michael F. Goodchild,et al.  Towards geospatial semantic search: exploiting latent semantic relations in geospatial data , 2014, Int. J. Digit. Earth.

[83]  Mario Gianni,et al.  Rescue robots at earthquake-hit Mirandola, Italy: A field report , 2012, 2012 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[84]  Heather McNairn,et al.  International Journal of Applied Earth Observation and Geoinformation , 2014 .

[85]  Potsdam,et al.  Complex networks in climate dynamics. Comparing linear and nonlinear network construction methods , 2009, 0907.4359.

[86]  Karen Willcox,et al.  Parameter and State Model Reduction for Large-Scale Statistical Inverse Problems , 2010, SIAM J. Sci. Comput..

[87]  Robin R. Murphy,et al.  Rescue robots for mudslides: A descriptive study of the 2005 La Conchita mudslide response , 2008, J. Field Robotics.

[88]  Christopher O. Justice,et al.  A Framework for Defining Spatially Explicit Earth Observation Requirements for a Global Agricultural Monitoring Initiative (GEOGLAM) , 2015, Remote. Sens..

[89]  A. Karnielia,et al.  Do vegetation indices provide a reliable indication of vegetation degradation ? A case study in the Mongolian pastures , 2013 .

[90]  P. Courtier,et al.  A strategy for operational implementation of 4D‐Var, using an incremental approach , 1994 .

[91]  Yan Peng,et al.  Spectral-spatial multi-feature classification of remote sensing big data based on a random forest classifier for land cover mapping , 2017, Cluster Computing.

[92]  Michael Batty,et al.  Cities and complexity - understanding cities with cellular automata, agent-based models, and fractals , 2007 .

[93]  Yun Ouyang,et al.  On the association between land system architecture and land surface temperatures: Evidence from a Desert Metropolis - Phoenix, Arizona, U.S.A , 2017 .

[94]  Stéphane Roche,et al.  GeoWeb and crisis management: issues and perspectives of volunteered geographic information , 2011, GeoJournal.

[95]  Anne E. Thessen,et al.  Adoption of Machine Learning Techniques in Ecology and Earth Science , 2016 .

[96]  Awais Ahmad,et al.  An Efficient Multidimensional Big Data Fusion Approach in Machine-to-Machine Communication , 2016, ACM Trans. Embed. Comput. Syst..

[97]  Thomas S. Huang,et al.  A comprehensive methodology for discovering semantic relationships among geospatial vocabularies using oceanographic data discovery as an example , 2017, Int. J. Geogr. Inf. Sci..

[98]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[99]  Chee Peng Lim,et al.  A new PSO-based approach to fire flame detection using K-Medoids clustering , 2017, Expert Syst. Appl..

[100]  G. Blelloch Introduction to Data Compression * , 2022 .

[101]  James C. Hayton,et al.  Factor Retention Decisions in Exploratory Factor Analysis: a Tutorial on Parallel Analysis , 2004 .

[102]  C. Pain,et al.  Model identification of reduced order fluid dynamics systems using deep learning , 2017, International Journal for Numerical Methods in Fluids.

[103]  C. Justice,et al.  High-Resolution Global Maps of 21st-Century Forest Cover Change , 2013, Science.

[104]  Ben Domenico,et al.  The Brokering Approach for Earth Science Cyberinfrastructure , 2011 .

[105]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[106]  Manzhu Yu,et al.  Big Data in Natural Disaster Management: A Review , 2018 .

[107]  David Dent,et al.  Quantitative mapping of global land degradation using Earth observations , 2011 .

[108]  Ü. Halik,et al.  Effects of green space spatial pattern on land surface temperature: Implications for sustainable urban planning and climate change adaptation , 2014 .

[109]  W. Wagner,et al.  Evaluation of the ESA CCI soil moisture product using ground-based observations , 2015 .

[110]  Hao Jiang,et al.  Big Earth Data: a new challenge and opportunity for Digital Earth’s development , 2017, Int. J. Digit. Earth.

[111]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[112]  Patti J. Clark,et al.  Emergency Management: Exploring Hard and Soft Data Fusion Modeling with Unmanned Aerial Systems and Non-governmental Human Intelligence Mediums , 2016, IntelliSys.

[113]  M. Brauer,et al.  Use of Satellite Observations for Long-Term Exposure Assessment of Global Concentrations of Fine Particulate Matter , 2014, Environmental health perspectives.

[114]  Thomas S. Huang,et al.  Towards intelligent geospatial data discovery: a machine learning framework for search ranking , 2018, Int. J. Digit. Earth.

[115]  Mustafa Neamah Jebur,et al.  Spatial prediction of landslide hazard at the Luxi area (China) using support vector machines , 2015, Environmental Earth Sciences.

[116]  Nataliia Kussul,et al.  Flood Hazard and Flood Risk Assessment Using a Time Series of Satellite Images: A Case Study in Namibia , 2014, Risk analysis : an official publication of the Society for Risk Analysis.

[117]  Bin Jiang,et al.  Geospatial Big Data Handling Theory and Methods: A Review and Research Challenges , 2015, ArXiv.

[118]  Shunichi Koshimura,et al.  Field survey report and satellite image interpretation of the 2013 Super Typhoon Haiyan in the Philippines , 2014 .

[119]  M. A. MacIver,et al.  Neuroscience Needs Behavior: Correcting a Reductionist Bias , 2017, Neuron.

[120]  Fei Hu,et al.  EFFICIENT LIDAR POINT CLOUD DATA MANAGING AND PROCESSINGIN A HADOOP-BASED DISTRIBUTED FRAMEWORK , 2017 .

[121]  Javier Nogueras-Iso,et al.  OGC Catalog Services: a key element for the development of Spatial Data Infrastructures , 2005, Comput. Geosci..

[122]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[123]  T. Okamoto,et al.  Accelerating large-scale simulation of seismic wave propagation by multi-GPUs and three-dimensional domain decomposition , 2010 .

[124]  Prem Prakash Jayaraman,et al.  Big Data Reduction Methods: A Survey , 2016, Data Science and Engineering.

[125]  Ching-Hsien Hsu,et al.  Machine Learning Based Big Data Processing Framework for Cancer Diagnosis Using Hidden Markov Model and GM Clustering , 2017, Wireless Personal Communications.

[126]  Marco Dubbini,et al.  Evaluating Multispectral Images and Vegetation Indices for Precision Farming Applications from UAV Images , 2015, Remote. Sens..

[127]  Qunying Huang,et al.  Evaluating open-source cloud computing solutions for geosciences , 2013, Comput. Geosci..

[128]  Henry Helvajian,et al.  Small Satellites: Past, Present, and Future , 2009 .

[129]  Vasileios Theodorou,et al.  Data generator for evaluating ETL process quality , 2017, Inf. Syst..

[130]  William H. Robinson,et al.  Resilient and efficient MANET aerial communications for search and rescue applications , 2013, 2013 International Conference on Computing, Networking and Communications (ICNC).

[131]  Kristen L Sanford Bernhardt,et al.  Agent-Based Modeling in Transportation , 2007 .

[132]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[133]  Chaowei Yang,et al.  Utilizing Cloud Computing to address big geospatial data challenges , 2017, Comput. Environ. Urban Syst..

[134]  Hui Zhang,et al.  Multi-Modal Description of Public Safety Events Using Surveillance and Social Media , 2019, IEEE Transactions on Big Data.

[135]  Alessandro Matese,et al.  A flexible unmanned aerial vehicle for precision agriculture , 2012, Precision Agriculture.

[136]  Zhe Jiang,et al.  Spatial Statistics , 2013 .

[137]  Board on Agriculture Precision Agriculture in the 21st Century: Geospatial and Information Technologies in Crop Management , 1998 .

[138]  G. Jia,et al.  Monitoring meteorological drought in semiarid regions using multi-sensor microwave remote sensing data , 2013 .

[139]  M. Cadenasso,et al.  Effects of the spatial configuration of trees on urban heat mitigation: A comparative study , 2017 .

[140]  Peter Baumann rasdaman: Array Databases Boost Spatio-Temporal Analytics , 2014, 2014 Fifth International Conference on Computing for Geospatial Research and Application.

[141]  Weiwei Song,et al.  A High Performance, Spatiotemporal Statistical Analysis System Based on a Spatiotemporal Cloud Platform , 2017, ISPRS Int. J. Geo Inf..

[142]  S. Goetz,et al.  Vegetation productivity patterns at high northern latitudes: a multi-sensor satellite data assessment , 2014, Global change biology.

[143]  Scott Shenker,et al.  Fast and Interactive Analytics over Hadoop Data with Spark , 2012, login Usenix Mag..

[144]  Nathaniel J. C. Libatique,et al.  UAV aerial imaging applications for post-disaster assessment, environmental management and infrastructure development , 2014, 2014 International Conference on Unmanned Aircraft Systems (ICUAS).

[145]  Rahul Ramachandran,et al.  Enabling Analytics in the Cloud for Earth Science Data , 2018 .

[146]  Pak Wai Chan,et al.  A multi‐sensor study of water vapour from radiosonde, MODIS and AERONET: a case study of Hong Kong , 2013 .

[147]  Christopher D. Manning Computational Linguistics and Deep Learning , 2015, Computational Linguistics.

[148]  Zhenlong Li,et al.  A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce , 2017, Int. J. Geogr. Inf. Sci..

[149]  Qunying Huang,et al.  Deep learning for real-time social media text classification for situation awareness – using Hurricanes Sandy, Harvey, and Irma as case studies , 2019, Int. J. Digit. Earth.

[150]  Pieter van der Zaag,et al.  AgriSuit: A web-based GIS-MCDA framework for agricultural land suitability assessment , 2016, Comput. Electron. Agric..

[151]  Mohammed A. Kalkhan Spatial Statistics: GeoSpatial Information Modeling and Thematic Mapping , 2011 .

[152]  Jia Liu,et al.  An efficient geosciences workflow on multi-core processors and GPUs: a case study for aerosol optical depth retrieval from MODIS satellite data , 2016, Int. J. Digit. Earth.

[153]  Pooja,et al.  Earth Science [Big] Data Analytics , 2019 .

[154]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[155]  J. Goodwin,et al.  Geographical Linked Data: The Administrative Geography of Great Britain on the Semantic Web , 2008 .

[156]  Jan M. H. Hendrickx,et al.  GIS-based NEXRAD Stage III precipitation database: automated approaches for data processing and visualization , 2005, Comput. Geosci..