An LLNL perspective on ASCI data mining and pattern recognition requirements

The working document has been put together by the members of the Sapphire project at LLNL. The goal of Sapphire is to apply and extend techniques from data mining and pattern recognition in order to detect automatically the areas of interest in very large data sets. The intent is to help scientists address the problem of data overload by providing them effective and efficient ways of exploring and analyzing massive data sets. One of the key areas where they expect this technology to be used is in the analysis of the output from ASCI simulations. It is expected that a simulation running on the 100 Tflop ASCI machine in the year 2004 will produce data at the rate of 12TB/hour. Given the difficulties they currently have in analyzing and visualizing a terabyte of data, it is imperative that they start planning now for ways that will make the analysis of petabyte data sets feasible. This document focuses on the relevance of data mining and pattern recognition to ASCI, discusses potential applications of these techniques in ASCI, and identifies research issues that arise as they apply the algorithms in these areas to massive data sets.