Extending the geographic extent of existing land cover data using active machine learning and covariate shift corrective sampling

ABSTRACT Consistent land cover data provided at national and regional scales are increasingly relevant for a wide range of research topics from landscape ecology to population dynamics. As one example, the National Land Cover Database (NLCD) provides a valuable resource for research conducted at broad geographic scales across the US where survey-based land cover data are not available. However, the national extent of the NLCD (and similar databases produced in other countries) prevents studies from reaching across borders and thus limits potential applications at broader (e.g. multinational) scales. This article presents a framework for automated spatial extrapolation of a national land cover database, such as the NLCD using Landsat imagery alone. The extrapolation of high quality land cover data represents a unique opportunity to efficiently generate similar quality data for regions not originally covered. Extending the NLCD in the spatial domain based on remote-sensing imagery alone manifests itself as a domain adaptation challenge know as covariate shift, where the distribution of spectral information for the target data does not follow that of the source data. To overcome this problem, the algorithm implements a novel corrective sampling technique that facilitates the spatial extrapolation of land cover data. Using the corrected sample, an active machine learning routine was implemented with a maximum entropy classifier to replicate the NLCD for a different geographic extent. This framework was tested in three study sites to assess stability under different landscape conditions and the overall generalizability of the approach. Results produced similar levels of overall agreement as the NLCD when compared against reference datasets, showing that the NLCD can effectively be extended to new geographic extents using the proposed framework.

[1]  Marine Lacoste,et al.  Extrapolation at regional scale of local soil knowledge using boosted classification trees: A two-step approach , 2012 .

[2]  Barbara P. Buttenfield,et al.  Maximum Entropy Dasymetric Modeling for Demographic Small Area Estimation , 2013 .

[3]  Chih-Jen Lin,et al.  Iterative Scaling and Coordinate Descent Methods for Maximum Entropy , 2009, ACL.

[4]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[5]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[6]  Roger A. Baldwin,et al.  Use of Maximum Entropy Modeling in Wildlife Research , 2009, Entropy.

[7]  J. Noonan,et al.  Maximum-Entropy Density Estimation , 2011 .

[8]  Eric P. Crist,et al.  A Physically-Based Transformation of Thematic Mapper Data---The TM Tasseled Cap , 1984, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Limin Yang,et al.  Thematic accuracy of MRLC land cover for the eastern United States , 2001 .

[10]  Yasemin Altun,et al.  Semi-supervised remote sensing image classification via maximum entropy , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[11]  Rasim Latifovic,et al.  North American Land-Change Monitoring System , 2012 .

[12]  Jeffrey W. Hollister,et al.  Assessing the Accuracy of National Land Cover Dataset Area Estimates at Multiple Spatial Extents , 2004 .

[13]  William J. Emery,et al.  Active Learning Methods for Remote Sensing Image Classification , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[14]  Mikhail F. Kanevski,et al.  SVM-Based Boosting of Active Learning Strategies for Efficient Domain Adaptation , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[15]  J. Wickham,et al.  Thematic accuracy of the NLCD 2001 land cover for the conterminous United States , 2010 .

[16]  Daumé,et al.  Domain Adaptation meets Active Learning , 2010, HLT-NAACL 2010.

[17]  J. Wickham,et al.  Effects of landscape characteristics on land-cover class accuracy , 2003 .

[18]  Lorenzo Bruzzone,et al.  Toward the Automatic Updating of Land-Cover Maps by a Domain-Adaptation SVM Classifier and a Circular Validation Strategy , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[19]  V. Radeloff,et al.  Phenological differences in Tasseled Cap indices improve deciduous forest classification , 2002 .

[20]  T. Donovan,et al.  DETERMINANTS OF WOOD THRUSH NEST SUCCESS: A MULTI-SCALE, MODEL SELECTION APPROACH , 2005 .

[21]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[22]  Hwee Tou Ng,et al.  Domain Adaptation with Active Learning for Word Sense Disambiguation , 2007, ACL.

[23]  Giles M. Foody,et al.  Sample size determination for image classification accuracy assessment and comparison , 2009 .

[24]  Lorenzo Bruzzone,et al.  Domain Adaptation Problems: A DASVM Classification Technique and a Circular Validation Strategy , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Galen Maclaurin,et al.  Temporal replication of the national land cover database using active machine learning , 2016 .

[26]  Timothy J. Fox,et al.  A cautionary tale regarding use of the National Land Cover Dataset 1992 , 2004 .

[27]  Barbara P. Buttenfield,et al.  Modeling residential developed land in rural areas: A size-restricted approach using parcel data , 2014 .

[28]  Gary J. Roloff,et al.  Where Wolves Kill Moose: The Influence of Prey Life History Dynamics on the Landscape Ecology of Predation , 2014, PloS one.

[29]  Limin Yang,et al.  An approach for mapping large-area impervious surfaces: synergistic use of Landsat-7 ETM+ and high spatial resolution imagery , 2003 .

[30]  Miroslav Dudík,et al.  Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling , 2007, J. Mach. Learn. Res..

[31]  Suming Jin,et al.  Completion of the 2011 National Land Cover Database for the Conterminous United States – Representing a Decade of Land Cover Change Information , 2015 .

[32]  R. Kauth,et al.  The tasselled cap - A graphic description of the spectral-temporal development of agricultural crops as seen by Landsat , 1976 .

[33]  Andrew McCallum,et al.  Active Learning by Labeling Features , 2009, EMNLP.

[34]  Lorenzo Bruzzone,et al.  Active Learning for Domain Adaptation in the Supervised Classification of Remote Sensing Images , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[35]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[36]  J. Wickham,et al.  Accuracy assessment of NLCD 2006 land cover and impervious surface , 2013 .

[37]  James R. Anderson,et al.  A land use and land cover classification system for use with remote sensor data , 1976 .

[38]  Weiqi Zhou,et al.  Evaluation of the National Land Cover Database for Hydrologic Applications in Urban and Suburban Baltimore, Maryland 1 , 2010 .

[39]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[40]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[41]  Christopher A. Barnes,et al.  Completion of the 2006 National Land Cover Database for the conterminous United States. , 2011 .

[42]  Giles M. Foody,et al.  The use of small training sets containing mixed pixels for accurate hard image classification: Training on mixed spectral responses for classification by a SVM , 2006 .

[43]  J. Wickham,et al.  Completion of the 2001 National Land Cover Database for the conterminous United States , 2007 .

[44]  Abigail M. York,et al.  Land fragmentation due to rapid urbanization in the Phoenix Metropolitan Area: Analyzing the spatiotemporal patterns and drivers , 2012 .

[45]  Nazmul Hossain,et al.  Change of impervious surface area between 2001 and 2006 in the conterminous United States , 2011 .

[46]  Wenkai Li,et al.  Please Scroll down for Article International Journal of Remote Sensing a Maximum Entropy Approach to One-class Classification of Remote Sensing Imagery a Maximum Entropy Approach to One-class Classification of Remote Sensing Imagery , 2022 .

[47]  Limin Yang,et al.  Development of a 2001 National land-cover database for the United States , 2004 .

[48]  Naif Alajlan,et al.  Large-Scale Image Classification Using Active Learning , 2014, IEEE Geoscience and Remote Sensing Letters.

[49]  Claudio Persello,et al.  Interactive Domain Adaptation for the Classification of Remote Sensing Images Using Active Learning , 2013, IEEE Geoscience and Remote Sensing Letters.

[50]  J. Wickham,et al.  Thematic accuracy of the 1992 National Land-Cover Data for the eastern United States: Statistical methodology and regional results , 2003 .

[51]  J. Wickham,et al.  Thematic accuracy of the 1992 National Land-Cover Data for the western United States , 2004 .

[52]  Francesca Bovolo,et al.  A Novel Domain Adaptation Bayesian Classifier for Updating Land-Cover Maps With Class Differences in Source and Target Domains , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[53]  Damien Sulla-Menashe,et al.  MODIS Collection 5 global land cover: Algorithm refinements and characterization of new datasets , 2010 .

[54]  Brian D. Wardlow,et al.  A State-Level Comparative Analysis of the GAP and NLCD Land-Cover Data Sets , 2003 .

[55]  Guangqing Chi Land Developability: Developing an Index of Land Use and Development for Population Research , 2010 .

[56]  Lawrence O. Hall,et al.  Active learning to recognize multiple types of plankton , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[57]  Suming Jin,et al.  A comprehensive change detection method for updating the National Land Cover Database to circa 2011 , 2013 .

[58]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[59]  W. Cohen,et al.  Landsat's Role in Ecological Applications of Remote Sensing , 2004 .

[60]  William J. Emery,et al.  Using active learning to adapt remote sensing image classifiers , 2011 .

[61]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[62]  Melba M. Crawford,et al.  Active Learning: Any Value for Classification of Remotely Sensed Data? , 2013, Proceedings of the IEEE.

[63]  JAMES R. MILLER,et al.  Spatial Extrapolation: The Science of Predicting Ecological Patterns and Processes , 2004 .

[64]  Richard N. Weisman,et al.  Effects of urbanization on watershed hydrology: The scaling of discharge with drainage area , 2006 .

[65]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..