Optical Cloud Pixel Recovery via Machine Learning

Remote sensing derived Normalized Difference Vegetation Index (NDVI) is a widely used index to monitor vegetation and land use change. NDVI can be retrieved from publicly available data repositories of optical sensors such as Landsat, Moderate Resolution Imaging Spectro-radiometer (MODIS) and several commercial satellites. Studies that are heavily dependent on optical sensors are subject to data loss due to cloud coverage. Specifically, cloud contamination is a hindrance to long-term environmental assessment when using information from satellite imagery retrieved from visible and infrared spectral ranges. Landsat has an ongoing high-resolution NDVI record starting from 1984. Unfortunately, this long time series NDVI data suffers from the cloud contamination issue. Though both simple and complex computational methods for data interpolation have been applied to recover cloudy data, all the techniques have limitations. In this paper, a novel Optical Cloud Pixel Recovery (OCPR) method is proposed to repair cloudy pixels from the time-space-spectrum continuum using a Random Forest (RF) trained and tested with multi-parameter hydrologic data. The RF-based OCPR model is compared with a linear regression model to demonstrate the capability of OCPR. A case study in Apalachicola Bay is presented to evaluate the performance of OCPR to repair cloudy NDVI reflectance. The RF-based OCPR method achieves a root mean squared error of 0.016 between predicted and observed NDVI reflectance values. The linear regression model achieves a root mean squared error of 0.126. Our findings suggest that the RF-based OCPR method is effective to repair cloudy pixels and provides continuous and quantitatively reliable imagery for long-term environmental analysis.

[1]  Per Jönsson,et al.  Seasonality extraction by function fitting to time-series of satellite sensor data , 2002, IEEE Trans. Geosci. Remote. Sens..

[2]  Adam R Ferguson,et al.  Development of a database for translational spinal cord injury research. , 2014, Journal of neurotrauma.

[3]  Patricia Kandus,et al.  NDVI patterns as indicator of morphodynamic activity in the middle Paraná River floodplain , 2016 .

[4]  Zhe Zhu,et al.  Object-based cloud and cloud shadow detection in Landsat imagery , 2012 .

[5]  P. Sellers Canopy reflectance, photosynthesis and transpiration , 1985 .

[6]  Chao-Hung Lin,et al.  Patch-Based Information Reconstruction of Cloud-Contaminated Multitemporal Images , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Farid Melgani,et al.  Contextual reconstruction of cloud-contaminated multitemporal multispectral images , 2006, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Alistair A. Young,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2017, MICCAI 2017.

[10]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[11]  J V Tu,et al.  Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. , 1996, Journal of clinical epidemiology.

[12]  Chao-Hung Lin,et al.  Cloud Removal From Multitemporal Satellite Images Using Information Cloning , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[13]  Robert C. Glen,et al.  Random Forest Models To Predict Aqueous Solubility , 2007, J. Chem. Inf. Model..

[14]  A. Belward,et al.  The Best Index Slope Extraction ( BISE): A method for reducing noise in NDVI time-series , 1992 .

[15]  S. Medeiros,et al.  Resilience of coastal wetlands to extreme hydrologic events in Apalachicola Bay , 2016 .

[16]  Herbert B. Osborn,et al.  Reciprocal-Distance Estimate of Point Rainfall , 1980 .

[17]  H. Barbosa,et al.  Influence of rainfall variability on the vegetation dynamics over Northeastern Brazil , 2016 .

[18]  Jin Chen,et al.  A simple method for reconstructing a high-quality NDVI time-series data set based on the Savitzky-Golay filter , 2004 .

[19]  Baihua Fu,et al.  Riparian vegetation NDVI dynamics and its relationship with climate, surface water and groundwater , 2015 .

[20]  Alexis J. Comber,et al.  Random forest classification of salt marsh vegetation habitats using quad-polarimetric airborne SAR, elevation and optical RS data , 2014 .

[21]  Darrel L. Williams,et al.  Historical record of Landsat global coverage: mission operations, NSLRSDA, and International Cooperator stations , 2006 .

[22]  Hao He,et al.  A Changing-Weight Filter Method for Reconstructing a High-Quality NDVI Time Series to Preserve the Integrity of Vegetation Phenology , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[23]  James Rowland,et al.  A weighted least-squares approach to temporal NDVI smoothing , 1999 .

[24]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[25]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[26]  Amaury Lendasse,et al.  Extreme learning machine for missing data using multiple imputations , 2016, Neurocomputing.

[27]  F. Hao,et al.  Vegetation NDVI Linked to Temperature and Precipitation in the Upper Catchments of Yellow River , 2012, Environmental Modeling & Assessment.

[28]  W. Verhoef,et al.  Reconstructing cloudfree NDVI composites using Fourier analysis of time series , 2000 .

[29]  David G. Long,et al.  A cloud-removal algorithm for SSM/I data , 1999, IEEE Trans. Geosci. Remote. Sens..

[30]  S. Hagen,et al.  Hydrodynamic modeling and analysis of sea-level rise impacts on salinity for oyster growth in Apalachicola Bay, Florida , 2015 .

[31]  Xiaoling Chen,et al.  Four decades of winter wetland changes in Poyang Lake based on Landsat observations between 1973 and 2013 , 2015 .

[32]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[33]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[34]  Leonardo Franco,et al.  Missing data imputation using statistical and machine learning methods in a real breast cancer problem , 2010, Artif. Intell. Medicine.

[35]  Luc Devroye,et al.  Consistency of Random Forests and Other Averaging Classifiers , 2008, J. Mach. Learn. Res..

[36]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[37]  Farid Melgani,et al.  Missing-Area Reconstruction in Multispectral Images Under a Compressive Sensing Perspective , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[38]  C. Justice,et al.  Analysis of the phenology of global vegetation using meteorological satellite data , 1985 .

[39]  Ting Wang,et al.  Application of Breiman's Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules , 2004, Multiple Classifier Systems.

[40]  D Stephenson,et al.  Infilling streamflow data using feed-forward back-propagation (BP) artificial neural networks: Application of standard BP and pseudo Mac Laurin power series BP techniques , 2007 .

[41]  J. Cihlar,et al.  Multitemporal, multichannel AVHRR data sets for land biosphere studies—Artifacts and corrections , 1997 .

[42]  Kevin P. Price,et al.  Spatial patterns of NDVI in response to precipitation and temperature in the central Great Plains , 2001 .

[43]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[44]  Garik Gutman,et al.  Vegetation indices from AVHRR: An update and future prospects , 1991 .

[45]  Gang Yang,et al.  Recovering Quantitative Remote Sensing Products Contaminated by Thick Clouds and Shadows Using Multitemporal Dictionary Learning , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[46]  B. Holben Characteristics of maximum-value composite images from temporal AVHRR data , 1986 .

[47]  Ross Sparks,et al.  Patching rainfall data using regression methods. , 1997 .

[48]  Victor F. Rodriguez-Galiano,et al.  Predictive modelling of gold potential with the integration of multisource information based on random forest: a case study on the Rodalquilar area, Southern Spain , 2014, Int. J. Geogr. Inf. Sci..

[49]  Liangpei Zhang,et al.  Sparse-based reconstruction of missing information in remote sensing images from spectral/temporal complementary information , 2015 .

[50]  T. Hothorn,et al.  Estimation of a Predictor’s Importance by Random Forests When There Is Missing Data: RISK Prediction in Liver Surgery using Laboratory Data , 2014, The international journal of biostatistics.

[51]  J. Cihlar Identification of contaminated pixels in AVHRR composite images for studies of land biosphere , 1996 .

[52]  Robert Eckardt,et al.  Removal of Optically Thick Clouds from Multi-Spectral Satellite Images Using Multi-Frequency SAR Data , 2013, Remote. Sens..

[53]  C. Tucker,et al.  Increased plant growth in the northern high latitudes from 1981 to 1991 , 1997, Nature.

[54]  Csaba Kertész Rigidity-Based Surface Recognition for a Domestic Legged Robot , 2016, IEEE Robotics and Automation Letters.

[55]  Gang Yang,et al.  Missing Information Reconstruction of Remote Sensing Data: A Technical Review , 2015, IEEE Geoscience and Remote Sensing Magazine.

[56]  M. Jovanović,et al.  Normalized difference vegetation index (NDVI) as the basis for local forest management. Example of the municipality of Topola, Serbia , 2015 .