Mining science data

The data from Scientific simulations, observations, and experiments are now being measured in terabytes and will soon reach the petabyte regime. The size of the data, as well as its complexity, make it difficult to find useful information in the data. This is of course disconcerting to scientists who wonder about the science still undiscovered in the data. The Sapphire Scientific data mining project is addressing this concern by applying data mining techniques to problems ranging in size from a few megabytes to a hundred terabytes in a variety of domains. In this paper, we briefly describe our work in several applications, including the identification of key features for edge harmonic oscillations in the DIII-D tokamak, classification of orbits in a Poincare plot, and tracking of features of interest in experimental images.