Integrating Bayesian networks and Simpson's paradox in data mining

This paper proposes to integrate two very different kinds of methods for data mining, namely the construction of Bayesian networks from data and the detection of occurrences of Simpson’s paradox. The former aims at discovering potentially causal knowledge in the data, whilst the latter aims at detecting surprising patterns in he data. By integrating these two kinds of methods we can hopefully discover patterns which are more likely to be useful to the user, a challenging data mining goal which is under-explored in the literature. The proposed integration method involves two approaches. The first approach uses the detection of occurrences of Simpson’s paradox as a preprocessing for a more effective construction of Bayesian networks; whilst the second approach uses the construction of a Bayesian network from data as a preprocessing for the detection of occurrences of Simpson’s paradox.

[1]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[2]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[3]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[4]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[5]  Bill Shipley,et al.  Cause and Correlation in Biology: A User''s Guide to Path Analysis , 2016 .

[6]  Rosa Blanco Gómez Learning Bayesian networks form data with factorisation and classification purposes. Applications in biomedicine , 2005 .

[7]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[8]  Oleg A. Prokopyev,et al.  APPLICATIONS IN BIOMEDICINE , 2006 .

[9]  Kwong-Sak Leung,et al.  Data Mining Using Grammar Based Genetic Programming and Applications , 2000 .

[10]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[11]  Ron Kohavi Focus the Mining Beacon: Lessons and Challenges from the World of E-Commerce , 2005, PKDD.

[12]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[13]  Shusaku Tsumoto,et al.  Clinical Knowledge Discovery in Hospital Information Systems: Two Case Studies , 2000, PKDD.

[14]  Takahira Yamaguchi,et al.  Evaluation of Rule Interestingness Measures with a Clinical Dataset on Hepatitis , 2004, PKDD.

[15]  Alex A. Freitas,et al.  Are we really discovering ''interesting'' knowledge from data? , 2006 .

[16]  Kenneth McGarry,et al.  A survey of interestingness measures for knowledge discovery , 2005, The Knowledge Engineering Review.

[17]  Deborah R. Carvalho,et al.  Evaluating the Correlation Between Objective Rule Interestingness Measures and Real Human Interest , 2005, PKDD.

[18]  Alex Alves Freitas,et al.  Discovering Surprising Instances of Simpson's Paradox in Hierarchical Multidimensional Data , 2006, Int. J. Data Warehous. Min..

[19]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[20]  Daniel Zelterman,et al.  Bayesian Artificial Intelligence , 2005, Technometrics.

[21]  Dirk Husmeier,et al.  Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks , 2003, Bioinform..

[22]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[23]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[24]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[25]  Alex Alves Freitas,et al.  Discovering interesting knowledge from a science and technology database with a genetic algorithm , 2004, Appl. Soft Comput..

[26]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[27]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[28]  Wynne Hsu,et al.  Using General Impressions to Analyze Discovered Classification Rules , 1997, KDD.

[29]  D UllmanJeffrey,et al.  Dynamic itemset counting and implication rules for market basket data , 1997 .

[30]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .