Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics

The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; leaping key geospatial applications into the dual data and compute-intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges. We implement a novel remote sensing data flow (RESFlow) for advancing machine learning to compute with massive amounts of remotely sensed imagery. The core contribution is partitioning massive amounts of data into homogeneous distributions for fitting simple models. RESFlow takes advantage of Apache Spark and the availability of modern computing hardware to harness the acceleration of deep learning inference on expansive remote sensing imagery. The framework incorporates a strategy to optimize resource utilization across multiple executors assigned to a single worker. We showcase its deployment in both computationally and data-intensive workloads for pixel-level labeling tasks. The pipeline invokes deep learning inference at three stages; during deep feature extraction, deep metric mapping, and deep semantic segmentation. The tasks impose compute-intensive and GPU resource sharing challenges motivating for a parallelized pipeline for all execution steps. To address the problem of hardware resource contention, our containerized workflow further incorporates a novel GPU checkout routine and the ticketing system across multiple workers. The workflow is demonstrated with NVIDIA DGX accelerated platforms and offers appreciable compute speed-ups for deep learning inference on pixel labeling workloads; processing 21 028 TB of imagery data and delivering output maps at area rate of 5.245 sq.km/s, amounting to 453 168 sq.km/day—reducing a 28 day workload to 21 h.

[1]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[2]  Bo Li,et al.  Parallel K-Means Clustering of Remote Sensing Images Based on MapReduce , 2010, WISM.

[3]  B. Bhaduri,et al.  LandScan USA: a high-resolution geospatial and temporal modeling approach for population distribution and dynamics , 2007 .

[4]  Begüm Demir,et al.  Bigearthnet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[5]  Raffay Hamid,et al.  GLOBAL-SCALE OBJECT DETECTION USING SATELLITE IMAGERY , 2014 .

[6]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[7]  Avinash C. Kak,et al.  Active learning for designing detectors for infrequently occurring objects in wide-area satellite imagery , 2018, Comput. Vis. Image Underst..

[8]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[11]  Jing Huang,et al.  DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Xipeng Shen,et al.  Exploring Flexible Communications for Streamlining DNN Ensemble Training Pipelines , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[14]  Marta C. González,et al.  Using Convolutional Networks and Satellite Imagery to Identify Patterns in Urban Environments at a Large Scale , 2017, KDD.

[15]  Weipeng Jing,et al.  A Model of Parallel Mosaicking for Massive Remote Sensing Images Based on Spark , 2017, IEEE Access.

[16]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[17]  Wei Huang,et al.  In-Memory Parallel Processing of Massive Remotely Sensed Data Using an Apache Spark on Hadoop YARN Model , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[18]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[19]  Jin Sun,et al.  An Efficient and Scalable Framework for Processing Remotely Sensed Big Data in Cloud Computing Environments , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Prabhat,et al.  Exascale Deep Learning for Climate Analytics , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Jiangye Yuan,et al.  Building Extraction at Scale Using Convolutional Neural Network: Mapping of the United States , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[22]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[23]  Raffay Hamid,et al.  Large-scale damage detection using satellite imagery , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yangyong Zhu,et al.  A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing , 2015, ICDS.

[25]  Nicu Sebe,et al.  Deep Metric and Hash-Code Learning for Content-Based Retrieval of Remote Sensing Images , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[26]  Prabhat,et al.  CosmoFlow: Using Deep Learning to Learn the Universe at Scale , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[27]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Roshan Rajak,et al.  High Resolution Satellite Image Processing Using Hadoop Framework , 2015, 2015 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM).

[29]  Min Wang,et al.  A New Approach for Large-Scale Scene Image Retrieval Based on Improved Parallel -Means Algorithm in MapReduce Environment , 2016 .

[30]  Xiao Chen,et al.  Infrastructure Quality Assessment in Africa using Satellite Imagery and Deep Learning , 2018, KDD.

[31]  Yongjun Zhang,et al.  Large-Scale Remote Sensing Image Retrieval by Deep Hashing Neural Networks , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jie Li,et al.  Remote sensing image segmentation based on Hadoop cloud platform , 2018, International Conference on Optical Instruments and Technology.