How to Apply the Geospatial Data Abstraction Library (GDAL) Properly to Parallel Geospatial Raster I/O?

Input/output (I/O) of geospatial raster data often becomes the bottleneck of parallel geospatial processing due to the large data size and diverse formats of raster data. The open-source Geospatial Data Abstraction Library (GDAL), which has been widely used to access diverse formats of geospatial raster data, has been applied recently to parallel geospatial raster processing. This article first explores the efficiency and feasibility of parallel raster I/O using GDAL under three common ways of domain decomposition: row-wise, column-wise, and block-wise. Experimental results show that parallel raster I/O using GDAL under column-wise or block-wise domain decomposition is highly inefficient and cannot achieve correct output, although GDAL performs well under row-wise domain decomposition. The reasons for this problem with GDAL are then analyzed and a two-phase I/O strategy is proposed, designed to overcome this problem. A data redistribution module based on the proposed I/O strategy is implemented for GDAL using a message-passing-interface (MPI) programming model. Experimental results show that the data redistribution module is effective.