Expectation Propagation in ExGen Graphs for Summarization : Preliminary Report

Summarization based on expected distribution domain generalization (ExGen) graphs aggregates data into summaries in many ways and identifies summaries that are far from user expectations, i.e., interesting. In this paper, we tackle two problems. First, we propose how to consistently propagate an expected distribution given by the user for one node to the entire ExGen graph. Secondly, we propose three interestingness measures. Based on these measures, we propose heuristics to prune nodes from the ExGen while searching for interesting summaries. We also demonstrate the interactive experimental process of our method and show the results we obtained by applying it to the Saskatchewan weather data.

[1]  Howard J. Hamilton,et al.  Heuristic Measures of Interestingness , 1999, PKDD.

[2]  Nick Cercone,et al.  Data Mining in Large Databases Using Domain Generalization Graphs , 1999, Journal of Intelligent Information Systems.

[3]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[5]  Jiawei Han,et al.  Attribute-Oriented Induction in Relational Databases , 1991, Knowledge Discovery in Databases.

[6]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[7]  Jan M. Zytkow,et al.  From Contingency Tables to Various Forms of Knowledge in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[8]  Wynne Hsu,et al.  Analyzing the Subjective Interestingness of Association Rules , 2000, IEEE Intell. Syst..