Machine Learning for Aerial Image Labeling

Information extracted from aerial photographs has found applications in a wide range of areas including urban planning, crop and forest management, disaster relief, and climate modeling. At present, much of the extraction is still performed by human experts, making the process slow, costly, and error prone. The goal of this thesis is to develop methods for automatically extracting the locations of objects such as roads, buildings, and trees directly from aerial images. We investigate the use of machine learning methods trained on aligned aerial images and possibly outdated maps for labeling the pixels of an aerial image with semantic labels. We show how deep neural networks implemented on modern GPUs can be used to efficiently learn highly discriminative image features. We then introduce new loss functions for training neural networks that are partially robust to incomplete and poorly registered target maps. Finally, we propose two ways of improving the predictions of our system by introducing structure into the outputs of the neural networks. We evaluate our system on the largest and most-challenging road and building detection datasets considered in the literature and show that it works reliably under a wide variety of conditions. Furthermore, we are releasing the first large-scale road and building detection datasets to the public in order to facilitate future comparisons with other methods.

[1]  J. M. Idelsohn A learning system for terrain recognition , 1970, Pattern Recognit..

[2]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[3]  Ruzena Bajcsy,et al.  Computer Recognition of Roads from Satellite Pictures , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  R. Kettig,et al.  Classification of Multispectral Image Data by Extraction and Classification of Homogeneous Objects , 1976, IEEE Transactions on Geoscience Electronics.

[5]  R. Haralick Automatic remote sensor image processing , 1976 .

[6]  Wang,et al.  Nonuniversal critical dynamics in Monte Carlo simulations. , 1987, Physical review letters.

[7]  J. McDermott,et al.  Rule-Based Interpretation of Aerial Imagery , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Scott E. Decatur,et al.  Application of neural networks to terrain classification , 1989, International 1989 Joint Conference on Neural Networks.

[9]  James W. Modestino,et al.  A Markov Random Field Model-Based Approach To Image Interpretation , 1989, Other Conferences.

[10]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[11]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[12]  P. Swain,et al.  Neural Network Approaches Versus Statistical Methods In Classification Of Multisource Remote Sensing Data , 1990 .

[13]  Ronald M. Welch,et al.  A neural network approach to cloud classification , 1990 .

[14]  Horst Bischof,et al.  Multispectral classification of Landsat-images using neural networks , 1992, IEEE Trans. Geosci. Remote. Sens..

[15]  Julian E. Boggess,et al.  Identification of Roads in Satellite Imagery Using Artificial Neural Networks: A Contextual Approach , 1993 .

[16]  Robert A. Schowengerdt,et al.  A detailed comparison of backpropagation neural network and maximum-likelihood classifiers for urban land use classification , 1995, IEEE Trans. Geosci. Remote. Sens..

[17]  Christian Heipke,et al.  EMPIRICAL EVALUATION OF AUTOMATICALLY EXTRACTED ROAD AXES , 1998 .

[18]  Carsten Steger,et al.  An Unbiased Detector of Curvilinear Structures , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Lorenzo Bruzzone,et al.  Automatic analysis of the difference image for unsupervised change detection , 2000, IEEE Trans. Geosci. Remote. Sens..

[20]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  L. S. Davis,et al.  An assessment of support vector machines for land cover classi(cid:142) cation , 2002 .

[22]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[23]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[24]  Emmanuel P. Baltsavias,et al.  Object extraction and revision by image analysis using existing geodata and knowledge: current status and steps towards operational systems☆ , 2004 .

[25]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Guoliang Fan,et al.  Automatic CRP mapping using nonparametric machine learning approaches , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[27]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Zhuowen Tu,et al.  Supervised Learning of Edges and Object Boundaries , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Richard S. Zemel,et al.  Learning and Incorporating Top-Down Cues in Image Segmentation , 2006, ECCV.

[30]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[31]  Ping Zhong,et al.  Using Combination of Statistical Models and Multilevel Structural Information for Detecting Urban Areas From a Single Gray-Level Image , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Horst Bischof,et al.  On-line Boosting for Car Detection from Aerial Images , 2007, 2007 IEEE International Conference on Research, Innovation and Vision for the Future.

[33]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.

[34]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Jake Porway,et al.  A hierarchical and contextual model for aerial image understanding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Helmut Mayer,et al.  Object extraction in photogrammetric computer vision , 2008 .

[37]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  H. Sebastian Seung,et al.  Maximin affinity learning of image segmentation , 2009, NIPS.

[39]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[40]  Thomas Mauthner,et al.  Semantic Classification in Aerial Imagery by Integrating Appearance and Height Information , 2009, ACCV.

[41]  Xin Huang,et al.  Road centreline extraction from high‐resolution imagery based on multiscale structural features and support vector machines , 2009 .

[42]  Horst Bischof,et al.  Semantic classification by covariance descriptors within a randomized forest , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[43]  Geoffrey E. Hinton,et al.  Learning to Detect Roads in High-Resolution Aerial Images , 2010, ECCV.

[44]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[45]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[46]  Horst Bischof,et al.  Aerial Photo Building Classification by Stacking Appearance and Elevation Measurements , 2010 .

[47]  H. Sebastian Seung,et al.  Boundary Learning by Optimization with Topological Constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  A. Krizhevsky Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[49]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[50]  Geoffrey E. Hinton,et al.  Conditional Restricted Boltzmann Machines for Structured Output Prediction , 2011, UAI.

[51]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[52]  Vincent Lepetit,et al.  Filter Learning for Linear Structure Segmentation , 2011 .

[53]  Franz Rottensteiner,et al.  Conditional random fields for the classification of lidar point clouds , 2012 .

[54]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[55]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[56]  Yann LeCun,et al.  Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers , 2012, ICML.

[57]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[58]  Geoffrey E. Hinton,et al.  Learning to Label Aerial Images from Noisy Data , 2012, ICML.

[59]  Sebastian Nowozin,et al.  Regression Tree Fields — An efficient, non-parametric approach to image labeling problems , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[61]  P. Ho Geoscience And Remote Sensing , 2014 .