Performance analysis and optimization for scalable deployment of deep learning models for country‐scale settlement mapping on Titan supercomputer

This paper presents a scalable object detection workflow for detecting objects, such as settlements, from remotely sensed (RS) imagery. We have successfully deployed this workflow on Titan supercomputer and utilized it for the task of mapping human settlement at a country scale. The performance of various stages in the workflow was analyzed before making it operational. The workflow implemented various strategies to address issues such as suboptimal resource utilization and long‐tail effects due to unbalanced image workload, data loss due to runtime failures, and maximum wall‐time constraints imposed by Titan's job scheduling policy. A mean shift clustering–based static load balancing strategy was implemented, which partitions the image load such that each partition contained similar‐sized images. Furthermore, a checkpoint‐restart strategy was added in the workflow as a fault‐tolerance mechanism to prevent the data losses due to unforeseen runtime failures. The performance of the above‐mentioned strategies was observed in various scenarios, such as node failure, exceeding wall time, and successful completion. Using this workflow, we have processed an RS data set that has a spatial resolution of 0.31 m and is comprised of 685 675 km2 of area of the Republic of Zambia in under six hours using 5426 nodes of the Titan supercomputer.

[1]  Richard E. Edwards,et al.  Supercomputer assisted generation of machine learning agents for the calibration of building energy models , 2013, XSEDE.

[2]  Amit Agarwal,et al.  CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.

[3]  John W. Young,et al.  A first order approximation to the optimum checkpoint interval , 1974, CACM.

[4]  Buddy Bland,et al.  Titan - Early experience with the Titan system at Oak Ridge National Laboratory , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[5]  Qian Du,et al.  High Performance Computing for Hyperspectral Remote Sensing , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[6]  Jie Li,et al.  Remote sensing image segmentation based on Hadoop cloud platform , 2018, International Conference on Optical Instruments and Technology.

[7]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Steven R. Young,et al.  Evolving Deep Networks Using HPC , 2017, MLHPC@SC.

[9]  S. Ghosh,et al.  Automatic building footprint extraction from high-resolution satellite image using mathematical morphology , 2018 .

[10]  John T. Daly,et al.  A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..

[11]  Michael Dixon,et al.  Google Earth Engine: Planetary-scale geospatial analysis for everyone , 2017 .

[12]  Surya S. Durbha,et al.  Accelerating Big Data processing chain in Image Information Mining using a hybrid HPC approach , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[13]  Luis Ángel Ruiz Fernández,et al.  Evaluation of Automatic Building Detection Approaches Combining High Resolution Images and LiDAR Data , 2011, Remote. Sens..

[14]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[15]  Mihai Datcu,et al.  Building Virtual Earth Observatories Using Ontologies, Linked Geospatial Data and Knowledge Discovery Algorithms , 2012, OTM Conferences.

[16]  Albert Y. Zomaya,et al.  pipsCloud: High performance cloud computing for remote sensing big data management and processing , 2018, Future Gener. Comput. Syst..

[17]  Esteban Meneses,et al.  Analyzing the Interplay of Failures and Workload on a Leadership-Class Supercomputer , 2015 .

[18]  Uwe Stilla,et al.  Deep Learning Earth Observation Classification Using ImageNet Pretrained Networks , 2016, IEEE Geoscience and Remote Sensing Letters.

[19]  Ali Ozgun Ok,et al.  Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts , 2013 .

[20]  Taejung Kim,et al.  Development of a graph-based approach for building detection , 1999, Image Vis. Comput..

[21]  Xin Pan,et al.  An object-based convolutional neural network (OCNN) for urban land use classification , 2018, Remote Sensing of Environment.

[22]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[23]  Catherine D. Schuman,et al.  A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).

[24]  Dengfeng Chai,et al.  A Probabilistic Framework for Building Extraction From Airborne Color Image and DSM , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[25]  Xuan Shi,et al.  Parallelizing maximum likelihood classification on computer cluster and graphics processing unit for supervised image classification , 2017, Int. J. Digit. Earth.

[26]  Budhendra L. Bhaduri,et al.  Exploiting convolutional representations for multiscale human settlement detection: Preliminary results , 2017, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[27]  Qian Du,et al.  Multisource Remote Sensing Data Classification Based on Convolutional Neural Network , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[28]  Forrest N. Iandola,et al.  FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Wei Yuan,et al.  Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks , 2018, Remote. Sens..

[30]  Patrick Valduriez,et al.  Parallel computation of PDFs on big spatial data using Spark , 2018, Distributed and Parallel Databases.

[31]  Daniel S. Katz,et al.  Scheduling many-task workloads on supercomputers: Dealing with trailing tasks , 2010, 2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers.

[32]  Vincent Mazet,et al.  Shape-Based Building Detection in Visible Band Images Using Shadow Information , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[33]  Jiangye Yuan,et al.  Building Extraction at Scale Using Convolutional Neural Network: Mapping of the United States , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[34]  Yongyang Xu,et al.  Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters , 2018, Remote. Sens..

[35]  Bo Huang,et al.  Urban land-use mapping using a deep convolutional neural network with high spatial resolution multispectral remote sensing imagery , 2018, Remote Sensing of Environment.

[36]  Surya S. Durbha,et al.  Semantics and High Performance Computing Driven Approaches for Enhanced Exploitation of Earth Observation (EO) Data: State of the Art , 2017 .

[37]  Cristiana Bentes,et al.  Exploiting Different Types of Parallelism in Distributed Analysis of Remote Sensing Data , 2017, IEEE Geoscience and Remote Sensing Letters.

[38]  Li Li,et al.  Implementation of the parallel mean shift-based image segmentation algorithm on a GPU cluster , 2019, Int. J. Digit. Earth.

[39]  Jing Huang,et al.  DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[40]  Anas M. Al-Oraiqat,et al.  Fusion of multispectral satellite imagery using a cluster of graphics processing unit , 2018, ArXiv.

[41]  Jibonananda Sanyal,et al.  Simulation and big data challenges in tuning building energy models , 2013, 2013 Workshop on Modeling and Simulation of Cyber-Physical Energy Systems (MSCPES).

[42]  Parvaneh Saeedi,et al.  Automatic Rooftop Extraction in Nadir Aerial Imagery of Suburban Regions Using Corners and Variational Level Set Evolution , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[43]  Rafael Ferreira da Silva,et al.  Climate Science Performance, Data and Productivity on Titan , 2015 .

[44]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.