Predictive Data Mining with Finite Mixtures

In data mining the goal is to develop methods for discovering previously unknown regularities from databases. The resulting models are interpreted and evaluated by domain experts, but some model evaluation criterion is needed also for the model construction process. The optimal choice would be to use the same criterion as the human experts, but this is usually impossible as the experts are not capable of expressing their evaluation criteria formally. On the other hand, it seems reasonable to assume that any model possessing the capability of making good predictions also captures some structure of the reality. For this reason, in predictive data mining the search for good models is guided by the expected predictive error of the models. In this paper we describe the Bayesian approach to predictive data mining in the finite mixture modeling framework. The finite mixture model family is a natural choice for domains where the data exhibits a clustering structure. In many real world domains this seems to be the case, as is demonstrated by our experimental results on a set of public domain databases.

[1]  Richard D. DeVeaux Statistical Factor Analysis and Related Methods , 1996 .

[2]  Xiaohua Hu,et al.  Rough Sets Similarity-Based Learning from Databases , 1995, KDD.

[3]  P. Kontkanen,et al.  Comparing Bayesian Model Class Selection Criteria by Discrete Finite Mixtures , 1996 .

[4]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  P. Kontkanen,et al.  Comparing Bayesian Model Class Selection Criteriaby Discrete Finite , 1996 .

[7]  M. Degroot Optimal Statistical Decisions , 1970 .

[8]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[9]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[11]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[12]  P. Langley,et al.  Computational Models of Scientific Discovery and Theory Formation , 1990 .

[13]  A. Raftery Approximate Bayes factors and accounting for model uncertainty in generalised linear models , 1996 .

[14]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[15]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[16]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[17]  Heikki Mannila,et al.  Design of Relational Databases , 1992 .

[18]  H. Tirri,et al.  Massively Parallel Case-Based Reasoning with Probabilistic Similarity Metrics , 1993, EWCBR.

[19]  Henry Tirri,et al.  Prababilistic Instance-Based Learning , 1996, ICML.

[20]  A. Cohen,et al.  Finite Mixture Distributions , 1982 .

[21]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[22]  M. Morris,et al.  The Design , 1998 .

[23]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[24]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[25]  J. E. Jackson,et al.  Statistical Factor Analysis and Related Methods: Theory and Applications , 1995 .