Data-Intensive Supercomputing in the Cloud: Global Analytics for Satellite Imagery

We present our experiences using cloud computing to support data-intensive analytics on satellite imagery for commercial applications. Drawing from our background in highperformance computing, we draw parallels between the early days of clustered computing systems and the current state of cloud computing and its potential to disrupt the HPC market. Using our own virtual file system layer on top of cloud remote object storage, we demonstrate aggregate read bandwidth of 230 gigabytes per second using 512 Google Compute Engine (GCE) nodes accessing a USA multi-region standard storage bucket. This figure is comparable to the best HPC storage systems in existence. We also present several of our application results, including the identification of field boundaries in Ukraine, and the generation of a global cloud-free base layer from Landsat imagery.

[1]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2002, The Kluwer International Series in Engineering and Computer Science.

[2]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[3]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[4]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[5]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[6]  Michael J. Wilson,et al.  Implementation on Landsat Data of a Simple Cloud-Mask Algorithm Developed for MODIS Land Bands , 2011, IEEE Geoscience and Remote Sensing Letters.

[7]  Shashi Shekhar,et al.  UMN-MapServer: A High-Performance, Interoperable, and Open Source Web Mapping and Geo-spatial Analysis System , 2006, GIScience.

[8]  David M. Beazley,et al.  Avalon: an Alpha/Linux cluster achieves 10 Gflops for $15k , 1998, SC '98.

[9]  Thomas L. Sterling,et al.  Parallel Supercomputing with Commodity Components , 1997, PDPTA.

[10]  Alessandro Nuvolari,et al.  Open source software development: Some historical perspectives , 2005, First Monday.

[11]  Gordon Bell,et al.  What's next in high-performance computing? , 2002, CACM.

[12]  Josiah L. Carlson,et al.  Redis in Action , 2013 .

[13]  Matthew J. Turk,et al.  Dark Sky Simulations: Early Data Release , 2014, 1407.2600.

[14]  Michael S. Warren,et al.  The Space Simulator: Modeling the Universe from Supernovae to Cosmology , 2003, SC.

[15]  Mark Johnson,et al.  Seeing the Earth in the Cloud: Processing one petabyte of satellite imagery in one day , 2015, 2015 IEEE Applied Imagery Pattern Recognition Workshop (AIPR).

[16]  Thomas L. Sterling,et al.  Pentium Pro Inside: I. A Treecode at 430 Gigaflops on ASCI Red, II. Price/Performance of $50/Mflop on Loki and Hyglac , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[17]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[18]  James Mason,et al.  Results from the Planet Labs Flock Constellation , 2014 .

[19]  F. Ashcroft,et al.  VIII. References , 1955 .

[20]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.