Active Data Mining of Correspondence for Qualitative Assessment of Scientific Computations

Active data mining constructs and evaluates possible models explaining a dataset, and reasons about the cost and impact of additional samples on refining and selecting among the models. It is particularly appropriate for applications characterized by expensive data collection, from either experiment or simulation. This paper develops an active mining mechanism based on a multi-level, qualitative analysis of correspondence. Correspondence operators presented here leverage domain knowledge to establish relationships among objects, evaluate implications for model selection, and leverage identified weaknesses to focus additional data collection. The utility of the qualitative framework is demonstrated in two scientific computing applications — matrix spectral portrait analysis and graphical assessment of Jordan forms of matrices. Results show that the mechanism efficiently samples computational experiments and successfully uncovers highlevel properties of data. The framework helps overcome noise and sparsity by leveraging domain knowledge to detect mutually reinforcing interpretations of spatial data.

[1]  Feng Zhao,et al.  STA: Spatio-Temporal Aggregation with Applications to Analysis of Diffusion-Reaction Phenomena , 2000, AAAI/IAAI.

[2]  Feng Zhao,et al.  Spatial Aggregation: Theory and Applications , 1996, J. Artif. Intell. Res..

[3]  Françoise Chaitin-Chatelin,et al.  Lectures on finite precision computations , 1996, Software, environments, tools.

[4]  Liliana Ironi,et al.  Automated mathematical modeling from experimental data: an application to material science , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[5]  Chris Bailey-Kellogg,et al.  Influence-based model decomposition for reasoning about spatially distributed physical systems , 2001, Artif. Intell..

[6]  Naren Ramakrishnan,et al.  Mining and visualizing recommendation spaces for elliptic PDEs with continuous attributes , 2000, TOMS.

[7]  Constantine Bekas,et al.  Towards the effective parallel computation of matrix pseudospectra , 2001, ICS '01.

[8]  Chris Bailey-Kellogg,et al.  Spatial Aggregation: Language and Applications , 1996, AAAI/IAAI, Vol. 1.

[9]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[10]  Alan Edelman,et al.  Nongeneric Eigenvalue Perturbations of Jordan Blocks , 1998 .

[11]  Andrew W. Moore,et al.  Learning Evaluation Functions to Improve Optimization by Local Search , 2001, J. Mach. Learn. Res..

[12]  Chris Bailey-Kellogg,et al.  Ambiguity-Directed Sampling for Qualitative Analysis of Sparse Data from Spatially-Distributed Physical Systems , 2001, IJCAI.

[13]  Chris Bailey-Kellogg,et al.  Influence-Based Model Decomposition , 1999, AAAI/IAAI.

[14]  Brian Falkenhainer,et al.  Compositional Modeling: Finding the Right Model for the Job , 1991, Artif. Intell..

[15]  Yehezkel Lamdan,et al.  Geometric Hashing: A General And Efficient Model-based Recognition Scheme , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[16]  Feng Zhao,et al.  Relation-based aggregation: finding objects in large spatial datasets , 2000, Intell. Data Anal..