Composing lineage metadata with XML for custom satellite-derived data products

As peer-to-peer dissemination of custom data products evolves among Earth science research groups, investigators and data managers must consider how to compose appropriate metadata for their research computing activities. Because workflows may span multiple groups, it is critical that lineage (provenance) metadata also be assembled to document and preserve the origins and processing history of constituent data products and transformations for future data consumers. To demonstrate methods for composing lineage metadata for custom processing, we introduce our terminology for workflow and employ a case study for the creation of satellite-derived ocean color data products. Our example contributes to a general metadata model for workflow that incorporates lineage. We then discuss metadata requirements for remote sensing-related data products. We propose two techniques for composing lineage metadata, both based on accessory XML metadata documents that are paired with the data products and versioned data transformations they describe. The first technique, implemented as a prototype, features a dedicated lineage server that introduces the indirection and flexibility necessary for Web-based lineage navigation. The second, more promising technique involves defining a simple Resource Description Framework (RDF) vocabulary for lineage metadata, and using extant RDF/XML tools for query and navigation. These methods provide guidelines for composing lineage metadata that are applicable to other domains.