Active Learning in the Spatial Domain for Remote Sensing Image Classification

Active learning (AL) algorithms have been proven useful in reducing the number of required training samples for remote sensing applications; however, most methods query samples pointwise without considering spatial constraints on their distribution. This may often lead to a spatially dispersed distribution of training points unfavorable for visual image interpretation or field surveys. The aim of this study is to develop region-based AL heuristics to guide user attention toward a limited number of compact spatial batches rather than distributed points. The proposed query functions are based on a tree ensemble classifier and combine criteria of sample uncertainty and diversity to select regions of interest. Class imbalance, which is inherent to many remote sensing applications, is addressed through stratified bootstrap sampling. Empirical tests of the proposed methods are performed with multitemporal and multisensor satellite images capturing, in particular, sites recently affected by large-scale landslide events. The assessment includes an experimental evaluation of the labeling time required by the user and the computational runtime, and a sensitivity analysis of the main algorithm parameters. Region-based heuristics that consider sample uncertainty and diversity are found to outperform pointwise sampling and region-based methods that consider only uncertainty. Reference landslide inventories from five different experts enable a detailed assessment of the spatial distribution of remaining errors and the uncertainty of the reference data.

[1]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[2]  André Stumpf,et al.  Active learning in the spatial-domain for landslide mapping in remote sensing images , 2012 .

[3]  A. Winsor Sampling techniques. , 2000, Nursing times.

[4]  Lorenzo Bruzzone,et al.  A cost-sensitive active learning technique for the definition of effective training sets for supervised classifiers , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[5]  K. V. Kumar,et al.  Characterising spectral, spatial and morphometric properties of landslides for semi-automatic detection using object-oriented methods , 2010 .

[6]  Goo Jun,et al.  Spatially Cost-Sensitive Active Learning , 2009, SDM.

[7]  Mikhail F. Kanevski,et al.  A Survey of Active Learning Algorithms for Supervised Remote Sensing Image Classification , 2011, IEEE Journal of Selected Topics in Signal Processing.

[8]  Padhraic Smyth,et al.  Bounds on the mean classification error rate of multiple experts , 1996, Pattern Recognit. Lett..

[9]  A. Brenning Benchmarking classifiers to optimally integrate terrain analysis and multispectral remote sensing in automatic rock glacier detection , 2009 .

[10]  Matthew Baker,et al.  Extraction of hydrological proximity measures from DEMs using parallel processing , 2011, Environ. Model. Softw..

[11]  George C. Runger,et al.  Active Batch Learning with Stochastic Query-by-Forest (SQBF) , 2011 .

[12]  Nikolaos Papanikolopoulos,et al.  Scalable Active Learning for Multiclass Image Classification , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[14]  K. Moffett,et al.  Remote Sens , 2015 .

[15]  C. Westen,et al.  Distribution pattern of earthquake-induced landslides triggered by the 12 May 2008 Wenchuan earthquake , 2010 .

[16]  G. Foody Assessing the accuracy of land cover change with imperfect ground reference data , 2010 .

[17]  Burr Settles,et al.  From Theories to Queries , 2011, Active Learning and Experimental Design @ AISTATS.

[18]  Fausto Guzzetti,et al.  Impact of mapping errors on the reliability of landslide hazard maps , 2002 .

[19]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[20]  P. Reichenbach,et al.  Comparing landslide inventory maps , 2008 .

[21]  Willy A. Lacerda,et al.  Mechanisms of the Recent Catastrophic Landslides in the Mountainous Range of Rio de Janeiro, Brazil , 2013 .

[22]  Antanas Verikas,et al.  Mining data with random forests: A survey and results of new tests , 2011, Pattern Recognit..

[23]  André Stumpf,et al.  Object-oriented mapping of landslides using Random Forests , 2011 .

[24]  Mark Craven,et al.  Active Learning with Real Annotation Costs , 2008 .

[25]  José Luis Rojo-Álvarez,et al.  Kernel-Based Framework for Multitemporal and Multisource Remote Sensing Data Classification and Change Detection , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[26]  J. J. de Gruijter,et al.  An R package for spatial coverage sampling and random sampling from compact geographical strata by k-means , 2010, Comput. Geosci..

[27]  J. Southworth An assessment of Landsat TM band 6 thermal data for analysing land cover in tropical dry forest regions , 2004 .

[28]  Edzer J. Pebesma,et al.  Multivariable geostatistics in S: the gstat package , 2004, Comput. Geosci..

[29]  William J. Emery,et al.  Active Learning Methods for Remote Sensing Image Classification , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Johannes R. Sveinsson,et al.  Random Forests for land cover classification , 2006, Pattern Recognit. Lett..

[32]  Tapas Ranjan Martha,et al.  Segment Optimization and Data-Driven Thresholding for Knowledge-Based Landslide Detection by Object-Based Image Analysis , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[33]  Tao Xiang,et al.  Active Learning using Dirichlet Processes for Rare Class Discovery and Classification , 2011, BMVC.

[34]  Kristin K. Nicodemus,et al.  Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures , 2011, Briefings Bioinform..

[35]  Amit Banerjee,et al.  A support vector method for anomaly detection in hyperspectral imagery , 2006, IEEE Transactions on Geoscience and Remote Sensing.

[36]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[37]  José Martínez Sotoca,et al.  When Overlapping Unexpectedly Alters the Class Imbalance Effects , 2007, IbPRIA.

[38]  Matthias Drusch,et al.  Sentinel-2: ESA's Optical High-Resolution Mission for GMES Operational Services , 2012 .

[39]  Gustavo Camps-Valls,et al.  Remote sensing image segmentation by active queries , 2012, Pattern Recognit..

[40]  Carolin Strobl,et al.  The behaviour of random forest permutation-based variable importance measures under predictor correlation , 2010, BMC Bioinformatics.

[41]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[42]  Michele Volpi,et al.  Unbiased query-by-bagging active learning for VHR image classification , 2010, Remote Sensing.

[43]  Michel Barlaud,et al.  kNN-based high-dimensional Kullback-Leibler distance for tracking , 2007, Eighth International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS '07).

[44]  Lorenzo Bruzzone,et al.  Kernel-based methods for hyperspectral image classification , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[45]  Lorenzo Bruzzone,et al.  Batch-Mode Active-Learning Methods for the Interactive Classification of Remote Sensing Images , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[46]  Michael J. Oimoen,et al.  ASTER Global Digital Elevation Model Version 2 - summary of validation results , 2011 .

[47]  José Luis Rojo-Álvarez,et al.  Robust support vector regression for biophysical variable estimation from remotely sensed images , 2006, IEEE Geoscience and Remote Sensing Letters.

[48]  Gustavo Camps-Valls,et al.  Semisupervised Classification of Remote Sensing Images With Active Queries , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[49]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[50]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[51]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[52]  André Stumpf,et al.  An Empirical Study Into Annotator Agreement, Ground Truth Estimation, and Algorithm Evaluation , 2013, IEEE Transactions on Image Processing.

[53]  Joydeep Ghosh,et al.  An Active Learning Approach to Hyperspectral Data Classification , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[54]  Julien Michel,et al.  Lazy yet efficient land-cover map generation for HR optical images , 2010, 2010 IEEE International Geoscience and Remote Sensing Symposium.

[55]  S. K. Jenson,et al.  Extracting topographic structure from digital elevation data for geographic information-system analysis , 1988 .

[56]  David D. Lewis,et al.  A sequential algorithm for training text classifiers: corrigendum and additional data , 1995, SIGF.

[57]  Mikhail F. Kanevski,et al.  Memory-Based Cluster Sampling for Remote Sensing Image Classification , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[58]  Xiaowei Xu,et al.  Representative Sampling for Text Classification Using Support Vector Machines , 2003, ECIR.

[59]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[60]  William J. Emery,et al.  Improving active learning methods using spatial information , 2011, 2011 IEEE International Geoscience and Remote Sensing Symposium.

[61]  Lorenzo Bruzzone,et al.  Robust multiple estimator systems for the analysis of biophysical parameters from remotely sensed data , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[62]  J. Henderson,et al.  High-level scene perception. , 1999, Annual review of psychology.

[63]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[64]  Arno Schäpe,et al.  Multiresolution Segmentation : an optimization approach for high quality multi-scale image segmentation , 2000 .