Terra Populus: Workflows for Integrating and Harmonizing Geospatial Population and Environmental Data

The Terra Populus project (TerraPop) addresses a variety of data management, curation, and preservation challenges with respect to spatiotemporal population and environmental data. In this article, we describe our approaches to these challenges, with a particular focus on geospatial data workflows and associated provenance metadata. The goal of TerraPop is to enable research, learning, and policy analysis by providing integrated spatiotemporal data describing people and their environment. To do so, TerraPop is assembling a globe-spanning and temporally extensive collection of high-quality population and environmental data, ensuring good documentation, and developing a Web-based data access system that enables users to assemble customized integrated data sets drawing on a variety of data sources and formats. We describe TerraPop's collection strategies, detail the geospatial workflows involved in preparing data for ingest into the project database and those used to transform data across formats for dissemination, and discuss the system used to capture and manage provenance metadata throughout the project. A key aspect of the project is the development of global current and historical administrative unit boundaries that can be linked to census data. These boundaries serve as the linchpin of TerraPop's data integration strategy, and constitute an important data set in their own right.