Discovering Surprising Instances of Simpson's Paradox in Hierarchical Multidimensional Data

This paper focuses on the discovery of surprising unexpected patterns based on a data mining method that consists of detecting instances of Simpson’s paradox. By its very nature, instances of this paradox tend to be surprising to the user. Previous work in the literature has proposed an algorithm for discovering instances of that paradox, but it addressed only flat data stored in a single relation. This work proposes a novel algorithm that considerably extends that previous work by discovering instances of Simpson’s paradox in hierarchical multidimensional data — the kind of data typically found in data warehouse and OLAP environments. Hence, the proposed algorithm can be regarded as integrating the areas of data mining and data warehousing by using an adapted data mining technique to discover surprising patterns from data warehouse and OLAP environments.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Alex A. Freitas,et al.  A critical review of rule surprisingness measures , 2003 .

[3]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[4]  Balaji Padmanabhan,et al.  A Belief-Driven Method for Discovering Unexpected Patterns , 1998, KDD.

[5]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[6]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[7]  Alex Alves Freitas,et al.  On Objective Measures of Rule Surprisingness , 1998, PKDD.

[8]  Takahira Yamaguchi,et al.  Evaluation of Rule Interestingness Measures with a Clinical Dataset on Hepatitis , 2004, PKDD.

[9]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[10]  Graham Newson,et al.  Simpson's paradox revisited , 1991, The Mathematical Gazette.

[11]  Einoshin Suzuki,et al.  Autonomous Discovery of Reliable Exception Rules , 1997, KDD.

[12]  Nicholas Rescher,et al.  Paradoxes: Their Roots, Range, and Resolution , 2001 .

[13]  Walter L. Smith Probability and Statistics , 1959, Nature.

[14]  Alex Alves Freitas,et al.  Understanding the Crucial Role of Attribute Interaction in Data Mining , 2001, Artificial Intelligence Review.

[15]  Alex Alves Freitas,et al.  Discovering interesting knowledge from a science and technology database with a genetic algorithm , 2004, Appl. Soft Comput..

[16]  Wynne Hsu,et al.  Post-Analysis of Learned Rules , 1996, AAAI/IAAI, Vol. 1.

[17]  S. Sumathi,et al.  Statistical Themes and Lessons for Data Mining , 2006 .

[18]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[19]  Sigal Sahar On incorporating subjective interestingness into the mining process , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[20]  Clifford H. Wagner Simpson's Paradox in Real Life , 1982 .

[21]  Wynne Hsu,et al.  Using General Impressions to Analyze Discovered Classification Rules , 1997, KDD.

[22]  Jinyan Li,et al.  Interestingness of Discovered Association Rules in Terms of Neighborhood-Based Unexpectedness , 1998, PAKDD.

[23]  Gregory Piatetsky-Shapiro,et al.  Selecting and reporting What Is Interesting , 1996, Advances in Knowledge Discovery and Data Mining.

[24]  Einoshin Suzuki,et al.  Discovery of Surprising Exception Rules Based on Intensity of Implication , 1998, PKDD.

[25]  Alex Alves Freitas,et al.  Mining Very Large Databases with Parallel Processing , 1997, The Kluwer International Series on Advances in Database Systems.