Data Product Configuration Management and Versioning in Large-Scale Production of Satellite Scientific Data

This paper describes a formal structure for keeping track of files, source code, scripts, and related material for large-scale Earth science data production. We first describe the environment and processes that govern this configuration management problem. Then, we show that a graph with typed nodes and arcs can describe the derivation of production design and of the produced files and their metadata. The graph provides three useful by-products: • a hierarchical data file inventory structure that can help system users find particular files, • methods for creating production graphs that govern job scheduling and provenance graphs that track all of the sources and transformations between raw data input and a particular output file, •a systematic relationship between different elements of the structure and development documentation.