A data mining approach for global burned area mapping

Abstract Global burned are algorithms provide valuable information for climate modellers since fire disturbance is responsible of a significant part of the emissions and their related impact on humans. The aim of this work is to explore how four different classification algorithms, widely used in remote sensing, such as Random Forest (RF), Support Vector Machine (SVM), Neural Networks (NN) and a well-known decision tree algorithm (C5.0), for classifying burned areas at global scale through a data mining methodology using 2008 MODIS data. A training database consisting of burned and unburned pixels was created from 130 Landsat scenes. The resulting database was highly unbalanced with the burned class representing less than one percent of the total. Therefore, the ability of the algorithms to cope with this problem was evaluated. Attribute selection was performed using three filters to remove potential noise and to reduce the dimensionality of the data: Random Forest, entropy-based filter, and logistic regression. Eight out of fifty-two attributes were selected, most of them related to the temporal difference of the reflectance of the bands. Models were trained using an 80% of the database following a ten-fold approach to reduce possible overfitting and to select the optimum parameters. Finally, the performance of the algorithms was evaluated over six different regions using official statistics where they were available and benchmark burned area products, namely MCD45 (V5.1) and MCD64 (V6). Compared to official statistics, the best agreement was obtained by MCD64 (OE = 0.15, CE = 0.29) followed by RF (OE = 0.27, CE = 0.21). For the remaining three areas (Angola, Sudan and South Africa), RF (OE = 0.47, CE = 0.45) yielded the best results when compared to the reference data. NN and SVM showed the worst performance with omission and commission error reaching 0.81 and 0.17 respectively. SVM and NN showed higher sensitivity to unbalanced datasets, as in the case of burned area, with a clear bias towards the majority class. On the other hand, tree based algorithms are more robust to this issue given their own mechanisms to deal with big and unbalanced databases.

[1]  U. Gessner,et al.  Regional land cover mapping and change detection in Central Asia using MODIS time-series , 2012 .

[2]  Dongmei Chen,et al.  Change detection from remotely sensed images: From pixel-based to object-based approaches , 2013 .

[3]  E. Chuvieco,et al.  Global burned area mapping from ENVISAT-MERIS and MODIS active fire data , 2015 .

[4]  Francisco Herrera,et al.  On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed , 2014, Inf. Sci..

[5]  D. Opitz,et al.  Classifying and mapping wildfire severity : A comparison of methods , 2005 .

[6]  Ronald J. Hall,et al.  Large fires as agents of ecological diversity in the North American boreal forest , 2008 .

[7]  M. Statheropoulos,et al.  Impacts of vegetation fire emissions on the environment, human health and security – A global perspective , 2008 .

[8]  J. Randerson,et al.  Analysis of daily, monthly, and annual burned area using the fourth‐generation global fire emissions database (GFED4) , 2013 .

[9]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[10]  A. Huete,et al.  Overview of the radiometric and biophysical performance of the MODIS vegetation indices , 2002 .

[11]  Andrew Aldersley,et al.  Global and regional analysis of climate and human drivers of wildfire. , 2011, The Science of the total environment.

[12]  T. Loboda,et al.  Regionally adaptable dNBR-based algorithm for burned area mapping from MODIS data , 2007 .

[13]  Phil Picton,et al.  Introduction to neural networks , 1994 .

[14]  Ashton M. Shortridge,et al.  Assessing Alternatives for Modeling the Spatial Distribution of Multiple Land-cover Classes at Sub-pixel Scales , 2007 .

[15]  Ioannis B. Theocharis,et al.  Burned Area Mapping Using Support Vector Machines and the FuzCoC Feature Selection Method on VHR IKONOS Imagery , 2014, Remote. Sens..

[16]  Carolin Strobl,et al.  Unbiased split selection for classification trees based on the Gini Index , 2007, Comput. Stat. Data Anal..

[17]  Jacques de Villiers,et al.  Backpropagation neural nets with one and two hidden layers , 1993, IEEE Trans. Neural Networks.

[18]  José M. C. Pereira,et al.  Burned Area Mapping in the Brazilian Savanna Using a One-Class Support Vector Machine Trained by Active Fires , 2017, Remote. Sens..

[19]  Xulin Guo,et al.  Remote Sensing Techniques in Monitoring Post-Fire Effects and Patterns of Forest Recovery in Boreal Forest Regions: A Review , 2013, Remote. Sens..

[20]  S. Ustin,et al.  Development of angle indexes for soil moisture estimation, dry matter detection and land-cover discrimination , 2007 .

[21]  V. Caselles,et al.  Mapping burns and natural reforestation using thematic Mapper data , 1991 .

[22]  S. Flasse,et al.  An evaluation of different bi-spectral spaces for discriminating burned shrub-savannah , 2001 .

[23]  E. Chuvieco,et al.  Assessment of different spectral indices in the red-near-infrared spectral domain for burned land discrimination , 2002 .

[24]  E. Chuvieco,et al.  Modelling Fire Ignition Probability from Satellite Estimates of Live Fuel Moisture Content , 2012 .

[25]  Mariano García,et al.  Assessment of the potential of SAC-C/MMRS imagery for mapping burned areas in Spain , 2004 .

[26]  Israel Gómez,et al.  Prototyping an artificial neural network for burned area mapping on a regional scale in Mediterranean areas using MODIS images , 2011, Int. J. Appl. Earth Obs. Geoinformation.

[27]  A. Murat Ozbayoglu,et al.  Estimation of the Burned Area in Forest Fires Using Computational Intelligence Techniques , 2012, Complex Adaptive Systems.

[28]  Gérard Dedieu,et al.  Assessing the robustness of Random Forests to map land cover with high resolution satellite image time series over large areas , 2016 .

[29]  Fei-Yue Wang,et al.  Posterior probability support vector Machines for unbalanced data , 2005, IEEE Transactions on Neural Networks.

[30]  N. C. Strugnell,et al.  First operational BRDF, albedo nadir reflectance products from MODIS , 2002 .

[31]  A. Cazenave,et al.  The ESA Climate Change Initiative: Satellite Data Records for Essential Climate Variables , 2013 .

[32]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[33]  Pierre Alliez,et al.  Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[34]  Alfred Stein,et al.  Estimation of area burned by forest fires in Mediterranean countries : a remote sensing data mining perspective , 2011 .

[35]  O. Beeri,et al.  Spatial and temporal patterns of vegetation recovery following sequences of forest fires in a Mediterranean landscape, Mt. Carmel Israel , 2007 .

[36]  J. Franklin,et al.  Mapping Wildfire Burn Severity in Southern California Forests and Shrublands Using Enhanced Thematic Mapper Imagery , 2001 .

[37]  D. Roya,et al.  Prototyping a global algorithm for systematic fire-affected area mapping using MODIS time series data , 2005 .

[38]  M. Cho,et al.  Classification of savanna tree species, in the Greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest data mining environment , 2012 .

[39]  F. Moreira,et al.  Regional variations in wildfire susceptibility of land-cover types in Portugal: implications for landscape management to minimize fire hazard , 2009 .

[40]  S. Stehman,et al.  Comparing the accuracies of remote sensing global burned area products using stratified random sampling and estimation , 2015 .

[41]  Yonghe Wang,et al.  Spatial patterns of forest fires in Canada, 1980-1999 , 2006 .

[42]  Emilio Chuvieco,et al.  Developing a Random Forest Algorithm for MODIS Global Burned Area Classification , 2017, Remote. Sens..

[43]  Christopher F. Barnes,et al.  Hurricane Disaster Assessments With Image-Driven Data Mining in High-Resolution Satellite Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[44]  M. A. H. Farquad,et al.  Preprocessing unbalanced data using support vector machine , 2012, Decis. Support Syst..

[45]  J. Pereira,et al.  Modeling spatial patterns of fire occurrence in Mediterranean Europe using Multiple Regression and Random Forest , 2012 .

[46]  B. Solaiman,et al.  A data mining based approach to predict spatiotemporal changes in satellite images , 2011, Int. J. Appl. Earth Obs. Geoinformation.

[47]  Jon Atli Benediktsson,et al.  Fusion of Support Vector Machines for Classification of Multisensor Data , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[48]  Jose L. Casanova,et al.  Burned area mapping system and fire detection system, based on neural networks and NOAA-AVHRR imagery , 2001 .

[49]  Witold R. Rudnicki,et al.  Boruta - A System for Feature Selection , 2010, Fundam. Informaticae.

[50]  J. Pereira,et al.  A new global burned area product for climate assessment of fire impacts , 2016 .

[51]  Heiko Balzter,et al.  Extrapolating Forest Canopy Fuel Properties in the California Rim Fire by Combining Airborne LiDAR and Landsat OLI Data , 2017, Remote. Sens..

[52]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[53]  Juha Hyyppä,et al.  Comparison of Tree Species Classifications at the Individual Tree Level by Combining ALS Data and RGB Images Using Different Algorithms , 2016, Remote. Sens..

[54]  Jonathan Cheung-Wai Chan,et al.  Multiple Criteria for Evaluating Machine Learning Algorithms for Land Cover Classification from Satellite Data , 2000 .

[55]  Xiaojun Yang,et al.  Using satellite imagery and GIS for land‐use and land‐cover change mapping in an estuarine watershed , 2005 .

[56]  D. Gesch,et al.  Global multi-resolution terrain elevation data 2010 (GMTED2010) , 2011 .

[57]  José M. C. Pereira,et al.  The use of SPOT VEGETATION data in a classification tree approach for burnt area mapping in Australian savanna , 2003 .

[58]  James T. Randerson,et al.  The impacts of climate, land use, and demography on fires during the 21st century simulated by CLM-CN , 2011 .

[59]  J. Randerson,et al.  The influence of burn severity on postfire vegetation recovery and albedo change during early succession in North American boreal forests , 2011 .

[60]  S. Stehman,et al.  Validation of the 2008 MODIS-MCD45 global burned area product using stratified random sampling , 2014 .

[61]  Emilio Chuvieco,et al.  Lightning-caused fires in Central Spain: Development of a probability model of occurrence for two Spanish regions , 2012 .

[62]  Ian McCallum,et al.  An Update on the globcarbon initiative : multi-sensor estimation of global biophysical products for global terrestrial carbon studies , 2007 .

[63]  Nikos Koutsias,et al.  Modelling long-term fire occurrence factors in Spain by accounting for local variations with geographically weighted regression , 2013 .

[64]  F. M. Danson,et al.  Use of a radiative transfer model to simulate the postfire spectral response to burn severity , 2006 .

[65]  Nikos Koutsias,et al.  Comparing ten classification methods for burned area mapping in a Mediterranean environment using Landsat TM satellite data , 2012 .

[66]  F. M. Danson,et al.  Multispectral and LiDAR data fusion for fuel type mapping using Support Vector Machine and decision rules , 2011 .

[67]  A. Gitelson,et al.  Vegetation and soil lines in visible spectral space: A concept and technique for remote estimation of vegetation fraction , 2002 .

[68]  David M. Theobald,et al.  Implementation of National Fire Plan treatments near the wildland–urban interface in the western United States , 2009, Proceedings of the National Academy of Sciences.

[69]  Wouter Peters,et al.  Dynamic biomass burning emission factors and their impact on atmospheric CO mixing ratios , 2013 .

[70]  Victor S. Sheng,et al.  Cost-Sensitive Learning and the Class Imbalance Problem , 2008 .

[71]  George P. Petropoulos,et al.  A Comparison of Spectral Angle Mapper and Artificial Neural Network Classifiers Combined with Landsat TM Imagery Analysis for Obtaining Burnt Area Mapping , 2010, Sensors.

[72]  Susan L. Ustin,et al.  Assessment of NDVI and NDWI spectral indices using MODIS time series analysis and development of a new spectral index based on MODIS shortwave infrared bands , 2005 .

[73]  Özge Uncu,et al.  A novel feature selection approach: Combining feature wrappers and filters , 2007, Inf. Sci..

[74]  Jon Atli Benediktsson,et al.  Sensitivity of Support Vector Machines to Random Feature Selection in Classification of Hyperspectral Data , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[75]  A. Smith,et al.  Production of Landsat ETM+ reference imagery of burned areas within Southern African savannahs: comparison of methods and application to MODIS , 2007 .

[76]  F. Zhou,et al.  A data mining approach for evaluation of optimal time-series of MODIS data for land cover mapping at a regional level , 2013 .

[77]  G. Powell,et al.  Terrestrial Ecoregions of the World: A New Map of Life on Earth , 2001 .

[78]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[79]  Bernard Kamsu-Foguem,et al.  Data mining techniques on satellite images for discovery of risk areas , 2017, Expert Syst. Appl..

[80]  D. Roy,et al.  What limits fire? An examination of drivers of burnt area in Southern Africa , 2009 .

[81]  B. Gao NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space , 1996 .

[82]  Mariana Belgiu,et al.  Random forest in remote sensing: A review of applications and future directions , 2016 .

[83]  D. Roberts,et al.  A VARI-based relative greenness from MODIS data for computing the Fire Potential Index , 2008 .

[84]  D. E. Harrison,et al.  Implementation Plan for the Global Observing System for Climate in Support of the UNFCCC (2010 Update) , 2010 .

[85]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[86]  S. Karsoliya,et al.  Approximating Number of Hidden layer neurons in Multiple Hidden Layer BPNN Architecture , 2012 .

[87]  Jiaqiu Wang,et al.  Integrated Spatio‐temporal Data Mining for Forest Fire Prediction , 2008, Trans. GIS.

[88]  Ioannis Z. Gitas,et al.  Mapping post-fire forest regeneration and vegetation recovery using a combination of very high spatial resolution and hyperspectral satellite imagery , 2013, Int. J. Appl. Earth Obs. Geoinformation.

[89]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[90]  Emilio Chuvieco,et al.  BAMS: A Tool for Supervised Burned Area Mapping Using Landsat Data , 2014, Remote. Sens..

[91]  B. Pinty,et al.  GEMI: a non-linear index to monitor global vegetation from satellites , 1992, Vegetatio.

[92]  B. Lang,et al.  Efficient optimization of support vector machine learning parameters for unbalanced datasets , 2006 .

[93]  J. Randerson,et al.  Global fire emissions and the contribution of deforestation, savanna, forest, agricultural, and peat fires (1997-2009) , 2010 .

[94]  Philippe Ciais,et al.  Ten years of global burned area products from spaceborne remote sensing - A review: Analysis of user needs and recommendations for future developments , 2014, Int. J. Appl. Earth Obs. Geoinformation.

[95]  Rick L. Lawrence,et al.  The AmericaView classification methods accuracy comparison project: A rigorous approach for model selection , 2015 .

[96]  Huan Liu,et al.  Instance Selection and Construction for Data Mining , 2001 .

[97]  J. Grégoire,et al.  A new, global, multi‐annual (2000–2007) burnt area product at 1 km resolution , 2008 .

[98]  Basabi Chakraborty,et al.  A review on application of data mining techniques to combat natural disasters , 2016, Ain Shams Engineering Journal.