Data management for earth system science

Earth system science is a relatively recent scientific discipline that seeks a global-scale understanding of the components, interactions, and evolution of the entire Earth system. The data being collected in support of Earth system science are rapidly approaching petabytes per year. The intrinsic problems of archiving, searching, and distributing such a huge dataset are compounded by both the heterogeneity of the data, and the heterogeneous nature of Earth system science inquiry, which synthesizes models, observations, and knowledge bases from a several traditional scientific disciplines.A successful data management environment for Earth system science must provide seamless access to arbitrary subsets and combinations of both local and remote data, and must be compatible with the rich data analysis environments already deployed. We describe a prototype of such an environment, built at UCSB using database technology pioneered by the Sequoia 2000 Project. We specifically address its application to a problem that requires combining point observations with gridded satellite imagery.