A Testbed Demonstration of an Intelligent Archive in a Knowledge Building System

The last decade's influx of raw data and derived geophysical parameters from several Earth observing satellites to NASA data centers has created a data-rich environment for Earth science research and applications. While advances in hardware and information management have made it possible to archive petabytes of data and distribute terabytes of data daily to a broad community of users, further progress is necessary in the transformation of data into information, and information into knowledge that can be used in particular applications in order to realize the full potential of these valuable datasets. In examining what is needed to enable this progress in the data provider environment that exists today and is expected to evolve in the next several years, we arrived at the concept of an Intelligent Archive in context of a Knowledge Building System (IA/KBS). Our prior work and associated papers investigated usage scenarios, required capabilities, system architecture, data volume issues, and supporting technologies. We identified six key capabilities of an IA/KBS: Virtual Product Generation, Significant Event Detection, Automated Data Quality Assessment, Large-Scale Data Mining, Dynamic Feedback Loop, and Data Discovery and Efficient Requesting. Among these capabilities, large-scale data mining is perceived by many in the community to be an area of technical risk. One of the main reasons for this is that standard data mining research and algorithms operate on datasets that are several orders of magnitude smaller than the actual sizes of datasets maintained by realistic earth science data archives. Therefore, we defined a test-bed activity to implement a large-scale data mining algorithm in a pseudo-operational scale environment and to examine any issues involved. The application chosen for applying the data mining algorithm is wildfire prediction over the continental U.S. This paper reports a number of observations based on our experience with this test-bed. While proof-of-concept for data mining scalability and utility has been a major goal for the research reported here, it was not the only one. The other five capabilities of an WKBS named above have been considered as well, and an assessment of the implications of our experience for these other areas will also be presented. The lessons learned through the testbed effort and presented in this paper will benefit technologists, scientists, and system operators as they consider introducing IA/KBS capabilities into production systems.