Definition of Effective Training Sets for Supervised Classification of Remote Sensing Images by a Novel Cost-Sensitive Active Learning Method

This paper proposes a novel cost-sensitive active learning (CSAL) method to the definition of reliable training sets for the classification of remote sensing images with support vector machines. Unlike standard active learning (AL) methods, the proposed CSAL method redefines AL by assuming that the labeling cost of samples during ground survey is not identical, but depends on both the samples accessibility and the traveling time to the considered locations. The proposed CSAL method selects the most informative samples on the basis of three criteria: 1) uncertainty; 2) diversity; and 3) labeling cost. The labeling cost of the samples is modeled by a novel cost function that exploits ancillary data such as the road network map and the digital elevation model of the considered area. In the proposed method, the three criteria are applied in two consecutive steps. In the first step, the most uncertain samples are selected, whereas in the second step the uncertain samples that are diverse and have low labeling cost are chosen. In order to select the uncertain samples that optimize the diversity and cost criteria, we propose two different optimization algorithms. The first algorithm is defined on the basis of a sequential forward selection optimization strategy, whereas the second one relies on a genetic algorithm. Experimental results show the effectiveness of the proposed CSAL method compared to standard AL methods that neglect the labeling cost.

[1]  Lorenzo Bruzzone,et al.  Kernel-based methods for hyperspectral image classification , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[2]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[3]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[4]  Goo Jun,et al.  Active learning of hyperspectral data with spatially dependent label acquisition costs , 2009, 2009 IEEE International Geoscience and Remote Sensing Symposium.

[5]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1992, Artificial Intelligence.

[6]  G. Laporte,et al.  A Branch-and-Cut Algorithm for the Undirected Selective Traveling Salesman Problem , 1998 .

[7]  William J. Emery,et al.  Active Learning Methods for Remote Sensing Image Classification , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Lorenzo Bruzzone,et al.  Batch-Mode Active-Learning Methods for the Interactive Classification of Remote Sensing Images , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Sankar K. Pal,et al.  Segmentation of multispectral remote sensing images using active support vector machines , 2004, Pattern Recognit. Lett..

[10]  Lorenzo Bruzzone,et al.  A Novel Transductive SVM for Semisupervised Classification of Remote-Sensing Images , 2006, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Sanghamitra Bandyopadhyay,et al.  Genetic Algorithms and Web Intelligence , 2007 .

[12]  Goo Jun,et al.  Spatially Cost-Sensitive Active Learning , 2009, SDM.

[13]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[14]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[15]  Joydeep Ghosh,et al.  An Active Learning Approach to Hyperspectral Data Classification , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[16]  Lorenzo Bruzzone,et al.  A Fast Cluster-Assumption Based Active-Learning Technique for Classification of Remote Sensing Images , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[17]  Sabine Fenstermacher,et al.  Genetic Algorithms Data Structures Evolution Programs , 2016 .

[18]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[19]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[20]  Lorenzo Bruzzone,et al.  A cluster-assumption based batch mode active learning technique , 2012, Pattern Recognit. Lett..

[21]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[22]  David S. Johnson,et al.  8. The traveling salesman problem: a case study , 2003 .

[23]  Lorenzo Bruzzone,et al.  Classification of hyperspectral images with support vector machines: multiclass strategies , 2004, SPIE Remote Sensing.