Ground Control to Major Tom: the importance of field surveys in remotely sensed data analysis

Author(s): Bolliger, Ian; Carleton, Tamma; Hsiang, Solomon; Kadish, Jonathan; Proctor, Jonathan; Recht, Benjamin; Rolf, Esther; Shankar, Vaishaal | Abstract: In this project, we build a modular, scalable system that can collect, store, and process millions of satellite images. We test the relative importance of both of the key limitations constraining the prevailing literature by applying this system to a data-rich environment. To overcome classic data availability concerns, and to quantify their implications in an economically meaningful context, we operate in a data rich environment and work with an outcome variable directly correlated with key indicators of socioeconomic well-being. We collect public records of sale prices of homes within the United States, and then gradually degrade our rich sample in a range of different ways which mimic the sampling strategies employed in actual survey-based datasets. Pairing each house with a corresponding set of satellite images, we use image-based features to predict housing prices within each of these degraded samples. To generalize beyond any given featurization methodology, our system contains an independent featurization module, which can be interchanged with any preferred image classification tool. Our initial findings demonstrate that while satellite imagery can be used to predict housing prices with considerable accuracy, the size and nature of the ground truth sample is a fundamental determinant of the usefulness of imagery for this category of socioeconomic prediction. We quantify the returns to improving the distribution and size of observed data, and show that the image classification method is a second-order concern. Our results provide clear guidance for the development of adaptive sampling strategies in data-sparse locations where satellite-based metrics may be integrated with standard survey data, while also suggesting that advances from image classification techniques for satellite imagery could be further augmented by more robust sampling strategies.

[1]  C. Lo,et al.  Integration of landsat thematic mapper and census data for quality of life assessment , 1997 .

[2]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[3]  R. Jensen,et al.  Using Remote Sensing and Geographic Information Systems to Study Urban Quality of Life and Urban Forest Amenities , 2004 .

[4]  Qihao Weng,et al.  Measuring the quality of life in city of Indianapolis by integration of remote sensing and census data , 2007 .

[5]  Andrew Y. Ng,et al.  Learning Feature Representations with K-Means , 2012, Neural Networks: Tricks of the Trade.

[6]  J. Henderson,et al.  Measuring Economic Growth from Outer Space , 2009, The American economic review.

[7]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Sang Michael Xie,et al.  Combining satellite imagery and machine learning to predict poverty , 2016, Science.

[10]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[11]  Benjamin Recht,et al.  Convolutional Kitchen Sinks for Transcription Factor Binding Site Prediction , 2017, 1706.00125.

[12]  Ion Stoica,et al.  Occupy the cloud: distributed computing for the 99% , 2017, SoCC.