Distributed Generation of NASA Earth Science Data Products

The objective of this work is the development of Grid-based approaches through which NASA data centers can become active participants in serving data users by transforming archived data into the specific form needed by the user. This approach involves generating custom data products from data stored in multiple NASA data centers. We describe a prototype developed to explore how Grid technology can facilitate this multi-center product generation. Our initial example of a custom data product is phenomena-based subsetting. This example involves production of a subset of a large collection of data based on the subset's association with some phenomena, such as a mesoscale convective system (severe storm) or a hurricane. We demonstrate that this subsetting can be performed on data located at a single data center or at multiple data centers. We also describe a system that performed customized data product generation using a combination of commodity processors deployed at a NASA data center, Grid technology to access these processors, and data mining software that intelligently selects where to perform processing based on data location and availability of compute resources. This demonstration also suggests that we could create a catalog of phenomena related data at multiple data centers, in which the catalog can contain references to the original data in different locations. The catalog is important to providing other users with efficient access to the data belonging to the identified phenomenon.

[1]  Ian T. Foster,et al.  The virtual data grid: a new model and architecture for data-intensive collaboration , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[2]  Bruce R. Barkstrom,et al.  Data Product Configuration Management and Versioning in Large-Scale Production of Satellite Scientific Data , 2003, SCM.

[3]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[4]  Jason Novotny,et al.  Data mining on NASA's Information Power Grid , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[5]  Reagan Moore,et al.  A simple mass storage system for the SRB data grid , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[6]  Sara J. Graves,et al.  Techniques and Experience in Mining Remotely Sensed Satellite Data , 2000, Artificial Intelligence Review.

[7]  Sara J. Graves,et al.  For scientific data discovery: why can't the archive be more like the Web? , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[8]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[9]  Gregor von Laszewski,et al.  CoG kits: a bridge between commodity distributed computing and high-performance grids , 2000, JAVA '00.

[10]  William E. Johnston,et al.  Grids as production computing environments: the engineering aspects of NASA's Information Power Grid , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[11]  B. Barkstrom Digital Archive Issues From the Perspective of an Earth Science Data Producer , 1998 .

[12]  Erwin Laure,et al.  Replica Management in Data Grids , 2002 .

[13]  Karen Irene Devlin Application of the 85 GHz ice scattering signature to a global study of mesoscale convective systems , 1995 .

[14]  Gagan Agrawal,et al.  Developing Distributed Data Mining Implementations for a Grid Environment , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[15]  Paul Avery,et al.  The griphyn project: towards petascale virtual data grids , 2001 .